Closed massimo1980 closed 5 years ago
Hi @massimo1980 thanks for the help!
We are using nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
and is strange that cannot find something on your machine. We need that command so we need to ensure why that container doesn't include it.
For the other steps can you do a pull requests? This is month is also the hacktoberfest so you can get a t-shirt if you do 4 pull requests during october.
Hi @Mte90 , thanks to you for the quick reply. to avoid any kind of misunderstanding, can you list me the command you use to run the docker container ?
For the pull request, actually i'm not so expert, if you explain me how to do it i'll give it a try
To run the image there is the command at https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/DeepSpeech/CONTRIBUTING.md in the bottom. For the pull request, you need to do a fork, commit your changes and open the pull request. We can help you in our Telegram Developers group that you can find with our @mozitabot.
@Mte90 ok it was my fault...now nvidia-smi and the write permissions are ok. i've managed to run the image and to start the training session but after a while:
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:01:42 | Steps: 88 | Loss: 172.407579 Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68]
[[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
[[tower_0/gradients/tower_0/BiasAdd_2_grad/BiasAddGrad/_77]]
(1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68]
[[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./DeepSpeech.py", line 906, in <module>
absl.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "./DeepSpeech.py", line 890, in main
train()
File "./DeepSpeech.py", line 608, in train
train_loss, _ = run_set('train', epoch, train_init_op)
File "./DeepSpeech.py", line 576, in run_set
feed_dict=feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68]
[[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at ./DeepSpeech.py:308) ]]
[[tower_0/gradients/tower_0/BiasAdd_2_grad/BiasAddGrad/_77]]
(1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68]
[[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at ./DeepSpeech.py:308) ]]
0 successful operations.
0 derived errors ignored.```
@lissyx do you have any hint for those errors? Maybe we need to disable cudnn?
@lissyx do you have any hint for those errors? Maybe we need to disable cudnn?
i've tried to reduce the batch_size of the train, test dev ...from 68 to 60 .... now seems to work ( btw i forgot to specify my hardware: i7 -6700k | GTX 1070 8gb mem )
Everytime pixz return:
can not seek in input: Illegal seek
hope this is only a warning
Yes, this is a warning
+ curl -sSL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.ba56407376f1e1109be33ac87bcb6eb9709b18be.cpu/artifacts/public/native_client.tar.xz
Likely that your docker image was generated more than one week ago and the artifact expired.
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]] [[tower_0/gradients/tower_0/BiasAdd_2_grad/BiasAddGrad/_77]] (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
I have absolutely no idea for those errors.
i've tried to reduce the batch_size of the train, test dev ...from 68 to 60 .... now seems to work
Strange, a batch size too big would generate OOM, usually. But yeah, as documented, those settings were established on french model with my hardware, i.e., RTX 2080 Ti (11GB RAM).
Currently, running:
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:28:36 | Steps: 2267 | Loss: 84.983498
Epoch 0 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 19.280736 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 57.125492 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 70.742396 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 69.326420 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 82.528525 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 58.634261 to: /mnt/checkpoints/best_dev-2267
Epoch 1 | Training | Elapsed Time: 0:28:36 | Steps: 2267 | Loss: 49.807577
WARNING:tensorflow:From /home/trainer/ds-train/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W1004 07:50:54.227288 140635919415104 deprecation.py:323] From /home/trainer/ds-train/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 1 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 14.869744 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 1 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 42.427034 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 1 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 60.430801 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 1 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 47.229852 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 1 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 71.610972 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 44.342112 to: /mnt/checkpoints/best_dev-4534
Epoch 2 | Training | Elapsed Time: 0:28:35 | Steps: 2267 | Loss: 38.781706
Epoch 2 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 12.907967 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 2 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 35.656022 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 2 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 55.144742 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 2 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 36.578218 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 2 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 65.437710 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 37.470106 to: /mnt/checkpoints/best_dev-6801
Epoch 3 | Training | Elapsed Time: 0:28:38 | Steps: 2267 | Loss: 32.290010
Epoch 3 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 11.469230 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 3 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 31.850810 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 3 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 52.353987 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 3 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 29.747222 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 3 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 61.468791 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 33.315463 to: /mnt/checkpoints/best_dev-9068
Epoch 4 | Training | Elapsed Time: 0:28:37 | Steps: 2267 | Loss: 27.813154
Epoch 4 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 10.661388 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 4 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 29.489315 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 4 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 49.917231 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 4 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 25.459971 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 4 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 59.274379 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 30.585386 to: /mnt/checkpoints/best_dev-11335
Epoch 5 | Training | Elapsed Time: 0:01:26 | Steps: 287 | Loss: 9.718229 ^Epoch 5 | Training | Elapsed Time: 0:28:40 | Steps: 2267 | Loss: 24.443504
Epoch 5 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 9.920453 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 5 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 28.171673 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 5 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 49.646353 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 5 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 22.118046 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 5 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 56.128802 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 28.908376 to: /mnt/checkpoints/best_dev-13602
Epoch 6 | Training | Elapsed Time: 0:28:39 | Steps: 2267 | Loss: 21.769397
Epoch 6 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 9.537197 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 6 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 26.599756 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 6 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 48.462345 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 6 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 19.238413 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 6 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 54.152451 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 27.191631 to: /mnt/checkpoints/best_dev-15869
Epoch 7 | Training | Elapsed Time: 0:28:31 | Steps: 2267 | Loss: 19.581926
Epoch 7 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 9.003750 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 7 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 25.890564 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 7 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 48.009400 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 7 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 17.315323 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 7 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 52.075734 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 26.149278 to: /mnt/checkpoints/best_dev-18136
Epoch 8 | Training | Elapsed Time: 0:28:36 | Steps: 2267 | Loss: 17.753123
Epoch 8 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 8.815179 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 8 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 24.873896 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 8 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 47.467988 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 8 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 15.427920 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 8 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 51.372372 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 25.106390 to: /mnt/checkpoints/best_dev-20403
Epoch 9 | Training | Elapsed Time: 0:28:32 | Steps: 2267 | Loss: 16.213704
Epoch 9 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 8.652601 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 9 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 24.659495 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 9 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 47.356290 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 9 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 13.904833 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 9 | Validation | Elapsed Time: 0:00:01 | Steps: 9 | Loss: 51.444731 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
I Saved new best validating model with loss 24.519857 to: /mnt/checkpoints/best_dev-22670
Epoch 10 | Training | Elapsed Time: 0:28:37 | Steps: 2267 | Loss: 14.880339
Epoch 10 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 8.520405 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv
Epoch 10 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 24.312960 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv
Epoch 10 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 46.965495 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
Epoch 10 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 12.535529 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv
Epoch 10 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 51.239443 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv
@massimo1980 @Mte90 Can we make sure that issues found here are also reported / fixed on my upstream? https://github.com/Common-Voice/commonvoice-fr/tree/master/DeepSpeech
ok training seems to complete without error...
The testing phase starts with /mnt/extracted/data/cv-it/clips/test.csv and seems to be good.
but..
Testing model on /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00
Traceback (most recent call last):
File "./DeepSpeech.py", line 906, in
/mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv
Is this file empty ?
Nope....
wav_filename,wav_filesize,transcript /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/chiesa di San Carlo Borromeo.wav,122072,chiesa di san carlo borromeo /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/Casa Rena.wav,72920,casa rena /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/parco del Museo del Tessile.wav,116608,parco del museo del tessile
i hope that the "spaces" inside file names doesn't bother
i hope that the "spaces" inside file names doesn't bother
I'm pretty sure I have some in mine, so it should be okay.
wav_filename,wav_filesize,transcript /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/chiesa di San Carlo Borromeo.wav,122072,chiesa di san carlo borromeo /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/Casa Rena.wav,72920,casa rena /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/parco del Museo del Tessile.wav,116608,parco del museo del tessile
Is that the full content? Only two files?
nope i have pasted only that ......this thread it's just pretty long :smile:
nope i have pasted only that ......this thread it's just pretty long smile
How many lines does the test
CSV contains ? Can you make sure it's more than the test batch size ?
Can you ensure there are no empty WAV file and/or empty transcriptions?
How many lines does the
test
CSV contains ? Can you make sure it's more than the test batch size ?
root@testing:/# cat /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv | wc -l 19 the actual run, if i have understood well, has these parameters: --train_batch_size 68 --dev_batch_size 68 --test_batch_size 68 as i wrote before, i have changed that parameters to 60 to be able to run the training
Can you ensure there are no empty WAV file and/or empty transcriptions?
yep. i mean, doing an ls -al of that folder doesn't show any wav file with 0 size
root@testing:/# cat /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv | wc -l 19 the actual run, if i have understood well, has these parameters: --train_batch_size 68 --dev_batch_size 68 --test_batch_size 68 as i wrote before, i have changed that parameters to 60 to be able to run the training
Right, you should try --test_batch_size 10
for example. I had a look at the Lingua Libre dataset in Italian, and it's very small compared to the French one, so it's not surprising.
Right, you should try
--test_batch_size 10
for example. I had a look at the Lingua Libre dataset in Italian, and it's very small compared to the French one, so it's not surprising.
yep! seems that decreasing test batch_size to 16 solved the problem. finally training and evaluation finished.
Now seems that, during the tflite model creation, breaks on switch error:
+ python -u DeepSpeech.py --alphabet_config_path /mnt/models/alphabet.txt --lm_binary_path /mnt/lm/lm.binary --lm_trie_path /mnt/lm/trie --feature_cache /mnt/sources/feature_cache --n_hidden 2048 --checkpoint_dir /mnt/checkpoints/ --export_dir /mnt/models/ --export_tflite --nouse_seq_length --export_language it
FATAL Flags parsing error: Unknown command line flag 'nouse_seq_length'
Pass --helpshort or --helpfull to see help on flags.
I've looked the list ( DeepSpeech.py --helpfull ) and seems that the switch "nouse_seq_length" doesn't exist.
We need to updates the contributing readme and also the docker file so we can avoid those issues in the future.
FATAL Flags parsing error: Unknown command line flag 'nouse_seq_length'
Pass --helpshort or --helpfull to see help on flags.
I've looked the list ( DeepSpeech.py --helpfull ) and seems that the switch "nouse_seq_length" doesn't exist.
Yeah, we removed that argument a while ago. As @Mte90 said, it'd important to stay as close as possible from my repo :)
ok! removing that switch seems to produce the model anyway ( ...anyway i will not use tflite )
Apart to run a "chmod +x run_it.sh", i have no more issues to report.
my last question ( i promise i will not bother anymore! :tongue: ):
i can run this model using the "mic_vad_streaming" example of DeepSpeech ? or there is a "quick way"?
i can run this model using the "mic_vad_streaming" example of DeepSpeech ? or there is a "quick way"?
Yes, we have re-worked that part of the project to ensure it was usable with the latest version, and have some CI running.
You can also just install the C++ client from native_client.tar.xz
, or use some of the bindings (Python, NodeJS) and their CLI interface: it's all going to run the same way.
( ...anyway i will not use tflite )
TFLite is required for running on Android and RPi platforms now. It's also giving better / smoother performance as a WebSpeech API implementation.
I can try to do update the script to solve errors @massimo1980 The thing changed are:
generate_alphabet.sh
...nevermind :cry: .. no luck
Loaded model in 1.57s. Loading language model from files /mnt/lm/lm.binary /mnt/lm/trie Error: Trie file version mismatch (4 instead of expected 3). Update your trie file. Loaded language model in 0.0333s.
Running inference.
Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state
Errore di segmentazione (core dump creato)
Loaded model in 1.57s. Loading language model from files /mnt/lm/lm.binary /mnt/lm/trie Error: Trie file version mismatch (4 instead of expected 3). Update your trie file. Loaded language model in 0.0333s.
Running inference.
Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state
Errore di segmentazione (core dump creato)
Have you installed 0.6.0a8
?
I can try to do update the script to solve errors @massimo1980 The thing changed are:
* path for sed in `generate_alphabet.sh` * change source to sources in lingualibre * set train batch size to 10 Anything else to change?
chmod +x run_it.sh actually seems that this file hasn't execution permission
https://github.com/mozilla/DeepSpeech/blob/master/examples/mic_vad_streaming/requirements.txt#L1
My bad!! i was still at 0.5.1 upgrading did the trick!
but now..
2019-10-04 16:59:03.243818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6783 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
Specified model file version (0) is incompatible with minimum version supported by this client (2).
See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
Traceback (most recent call last): File "/usr/local/bin/deepspeech", line 11, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/deepspeech/client.py", line 80, in main ds = Model(args.model, args.alphabet, BEAM_WIDTH) File "/usr/local/lib/python3.6/dist-packages/deepspeech/__init__.py", line 39, in __init__ raise RuntimeError("CreateModel failed with error code {}".format(status))
RuntimeError: CreateModel failed with error code 8195
Specified model file version (0) is incompatible with minimum version supported by this client (2)
Can you make sure you trained from matching version ?
Specified model file version (0) is incompatible with minimum version supported by this client (2)
Can you make sure you trained from matching version ?
The commit referenced is https://github.com/mozilla/DeepSpeech/tree/ba56407376f1e1109be33ac87bcb6eb9709b18be, but you need to make sure you have ran docker build
against that commit. Otherwise your tree inside the docker image might be much older.
ok ...i'm deleting everything in /mnt and deleted every docker images/container... i'll restart everything and hope to remake the training without issues :+1:
@mone27 in CONTRIBUTING.md there's: docker build -f Dockerfile.train.it . but actually the file is named "Dockerfile.train"
No luck today...
deleted everything and rebuilded .....nothing
Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
commit in container was: ba56407376f1e1109be33ac87bcb6eb9709b18be
@mone27 sorry if i didn't answer....as soon as i can i will test the new updates :+1:
deleted everything and rebuilded .....nothing
Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
Version is being set at export time, so I don't know what you are doing / how you are testing, but it's somehow wrong.
commit in container was: ba56407376f1e1109be33ac87bcb6eb9709b18be
Can you double check that this was indeed the tree used ?
Can you check the content of GRAPH_VERSION
file in the container image ?
Can you share full training log including export phase ?
commit in container was: ba56407376f1e1109be33ac87bcb6eb9709b18be
Can you double check that this was indeed the tree used ?
Just checked twice: ba56407376f1e1109be33ac87bcb6eb9709b18be
Can you check the content of
GRAPH_VERSION
file in the container image ?
Actually i don't know how to access to the container. if i use docker exec bash, it starts automatically the "run.sh" script
Can you share full training log including export phase ?
of course sir! it starts from the "alphabet" generate https://pastebin.com/jFMMCp3j
Actually i don't know how to access to the container. if i use docker exec bash, it starts automatically the "run.sh" script
Force rebuild the image to be sure.
it starts from the "alphabet" generate https://pastebin.com/jFMMCp3j
Nothing obvious here. Try using --tty
when you docker run
Actually i don't know how to access to the container. if i use docker exec bash, it starts automatically the "run.sh" script
Force rebuild the image to be sure.
Step 3/56 : ARG ds_branch=ba56407376f1e1109be33ac87bcb6eb9709b18be
can't find nothing like "GRAPH_VERSION" only this -> NCCL_VERSION=2.4.8 but i don't think that is what you are looking for....
it starts from the "alphabet" generate https://pastebin.com/jFMMCp3j
Nothing obvious here. Try using
--tty
when youdocker run
of course there isn't the "training" part cause model files already exists
Step 3/56 : ARG ds_branch=ba56407376f1e1109be33ac87bcb6eb9709b18be
Ok, but is the checkout actually from that branch.
can't find nothing like "GRAPH_VERSION"
It is a file in the DeepSpeech checkout. This is where we pull the version number.
It is a file in the DeepSpeech checkout. This is where we pull the version number.
cat GRAPH_VERSION 2
cat GRAPH_VERSION 2
Can you document precisely how you test that then ?
@massimo1980 Can you also give a try with the french model ? https://github.com/Common-Voice/commonvoice-fr/releases/tag/v0.6.0-fr-0.2
cat GRAPH_VERSION 2
Can you document precisely how you test that then ?
root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# find / -type f -name GRAPH_VERSION /opt/DeepSpeech/GRAPH_VERSION /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# cat /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION 2
Can you document precisely how you test that then ?
root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# find / -type f -name GRAPH_VERSION /opt/DeepSpeech/GRAPH_VERSION /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# cat /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION 2
No, I mean how you test your model, how you get to
Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
Can you document precisely how you test that then ?
root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# find / -type f -name GRAPH_VERSION /opt/DeepSpeech/GRAPH_VERSION /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# cat /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION 2
No, I mean how you test your model, how you get to
i'm sorry i misunderstood! using deepspeech installed via pip:
root@testing:/opt/DeepSpeech# pip freeze | grep deepspeech deepspeech==0.6.0a8 deepspeech-gpu==0.6.0a8
that's the output https://pastebin.com/MUajYYZk
root@testing:/opt/DeepSpeech# pip freeze | grep deepspeech deepspeech==0.6.0a8 deepspeech-gpu==0.6.0a8
Could you please install either, but not both ?
that's the output https://pastebin.com/MUajYYZk
I have no idea what you are doing wrong here. Please test with my model. It was exported using mostly the same tooling ...
root@testing:/opt/DeepSpeech# pip freeze | grep deepspeech deepspeech==0.6.0a8 deepspeech-gpu==0.6.0a8
Could you please install either, but not both ?
that's the output https://pastebin.com/MUajYYZk
I have no idea what you are doing wrong here. Please test with my model. It was exported using mostly the same tooling ...
Ok now i can confirm that i'm doing nothing wrong :joy: i've modified the script in the container to run:
/mnt/extracted/deepspeech --model /mnt/models/output_graph.pbmm --alphabet /mnt/models/alphabet.txt --lm /mnt/lm/lm.binary --trie /mnt/lm/trie --audio /mnt/galatea_01_barrili_f000040.wav
....and same error
+ export TMP=/mnt/tmp
+ export TEMP=/mnt/tmp
+ /mnt/extracted/deepspeech --model /mnt/models/output_graph.pbmm --alphabet /mnt/models/alphabet.txt --lm /mnt/lm/lm.binary --trie /mnt/lm/trie --audio /mnt/galatea_01_barrili_f000040.wav
TensorFlow: v1.14.0-16-g3b4ce37
DeepSpeech: v0.6.0-alpha.8-13-g30e0da9
2019-10-04 17:45:50.657784: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Specified model file version (0) is incompatible with minimum version supported by this client (2).
See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
Could not create model.
All the files and dir exists and i'm using the bin compiled directly in the container
Tomorrow, if i can, i will try french model :+1:
i've modified the script in the container to run:
Can you share your modifications ?
/mnt/extracted/deepspeech
Like, where is this coming from ?
Tomorrow, if i can, i will try french model +1
Can you share your Italian model ?
Specified model file version (0) is incompatible with minimum version supported by this client (2).
Reading the code, the only good explanation would be that reading the GRAPH_VERSION
file silently fails and frozen_graph.version
ends up being 0
.
Hi there, ( have to write in English or i can write in Italian ? )
i have tested the docker instance and i've found this errors:
wget https://lingualibre.fr/datasets/Q385-ita-Italian.zip -O /mnt/source/lingua_libre_Q385-ita-Italian_train.zip
change "/mnt/source/" into "/mnt/sources/"
sed -i s/#//g '/mnt/extracted/data/*test.csv'
sed: can't read /mnt/extracted/data/*test.csv: No such file or directory
my workaround was to specify the directories like this:
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*test.csv
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*train.csv
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*dev.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*test.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*train.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*dev.csv
+ rm /mnt/lm/lm.arpa
+ '[' '!' -f /mnt/lm/trie ']'
+ curl -sSL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.ba56407376f1e1109be33ac87bcb6eb9709b18be.cpu/artifacts/public/native_client.tar.xz
+ pixz -d
+ tar -xf -
can not seek in input: Illegal seek
Not an XZ file
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
browsing that url gives: ResourceNotFound
P.S. Everytime pixz return:
can not seek in input: Illegal seek
hope this is only a warning P.P.S. i've tried to format this thread as best as possible but seems i can't...sorry if it is too chaoticHope to help in some way
Regards Massimo