MozillaItalia / DeepSpeech-Italian-Model

Tooling for producing Italian model (public release available) for DeepSpeech and text corpus
GNU General Public License v3.0
93 stars 20 forks source link

Different errors in bash scripts #17

Closed massimo1980 closed 5 years ago

massimo1980 commented 5 years ago

Hi there, ( have to write in English or i can write in Italian ? )

i have tested the docker instance and i've found this errors:

change "/mnt/source/" into "/mnt/sources/"

my workaround was to specify the directories like this: sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*test.csv sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*train.csv sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*dev.csv sed -i 's/#//g' /mnt/extracted/data/lingualibre/*test.csv sed -i 's/#//g' /mnt/extracted/data/lingualibre/*train.csv sed -i 's/#//g' /mnt/extracted/data/lingualibre/*dev.csv

browsing that url gives: ResourceNotFound

P.S. Everytime pixz return: can not seek in input: Illegal seek hope this is only a warning P.P.S. i've tried to format this thread as best as possible but seems i can't...sorry if it is too chaotic

Hope to help in some way

Regards Massimo

Mte90 commented 5 years ago

Hi @massimo1980 thanks for the help! We are using nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04 and is strange that cannot find something on your machine. We need that command so we need to ensure why that container doesn't include it.

For the other steps can you do a pull requests? This is month is also the hacktoberfest so you can get a t-shirt if you do 4 pull requests during october.

massimo1980 commented 5 years ago

Hi @Mte90 , thanks to you for the quick reply. to avoid any kind of misunderstanding, can you list me the command you use to run the docker container ?

For the pull request, actually i'm not so expert, if you explain me how to do it i'll give it a try

Mte90 commented 5 years ago

To run the image there is the command at https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/DeepSpeech/CONTRIBUTING.md in the bottom. For the pull request, you need to do a fork, commit your changes and open the pull request. We can help you in our Telegram Developers group that you can find with our @mozitabot.

massimo1980 commented 5 years ago

@Mte90 ok it was my fault...now nvidia-smi and the write permissions are ok. i've managed to run the image and to start the training session but after a while:


I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:01:42 | Steps: 88 | Loss: 172.407579                                                                                                                                                                                             Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] 
     [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
     [[tower_0/gradients/tower_0/BiasAdd_2_grad/BiasAddGrad/_77]]
  (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] 
     [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./DeepSpeech.py", line 906, in <module>
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "./DeepSpeech.py", line 890, in main
    train()
  File "./DeepSpeech.py", line 608, in train
    train_loss, _ = run_set('train', epoch, train_init_op)
  File "./DeepSpeech.py", line 576, in run_set
    feed_dict=feed_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] 
     [[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at ./DeepSpeech.py:308) ]]
     [[tower_0/gradients/tower_0/BiasAdd_2_grad/BiasAddGrad/_77]]
  (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] 
     [[node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3 (defined at ./DeepSpeech.py:308) ]]
0 successful operations.
0 derived errors ignored.```
Mte90 commented 5 years ago

@lissyx do you have any hint for those errors? Maybe we need to disable cudnn?

massimo1980 commented 5 years ago

@lissyx do you have any hint for those errors? Maybe we need to disable cudnn?

i've tried to reduce the batch_size of the train, test dev ...from 68 to 60 .... now seems to work ( btw i forgot to specify my hardware: i7 -6700k | GTX 1070 8gb mem )

lissyx commented 5 years ago

Everytime pixz return: can not seek in input: Illegal seek hope this is only a warning

Yes, this is a warning

+ curl -sSL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.ba56407376f1e1109be33ac87bcb6eb9709b18be.cpu/artifacts/public/native_client.tar.xz

Likely that your docker image was generated more than one week ago and the artifact expired.

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]] [[tower_0/gradients/tower_0/BiasAdd_2_grad/BiasAddGrad/_77]] (1) Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 2048, 2048, 1, 461, 68] [[{{node tower_0/gradients/tower_0/cudnn_lstm/CudnnRNNV3_grad/CudnnRNNBackpropV3}}]]

I have absolutely no idea for those errors.

i've tried to reduce the batch_size of the train, test dev ...from 68 to 60 .... now seems to work

Strange, a batch size too big would generate OOM, usually. But yeah, as documented, those settings were established on french model with my hardware, i.e., RTX 2080 Ti (11GB RAM).

lissyx commented 5 years ago

Currently, running:

I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:28:36 | Steps: 2267 | Loss: 84.983498                                                                                                                                                                                                                                                                                                            
Epoch 0 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 19.280736 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                               
Epoch 0 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 57.125492 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 0 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 70.742396 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 0 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 69.326420 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 82.528525 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 58.634261 to: /mnt/checkpoints/best_dev-2267
Epoch 1 |   Training | Elapsed Time: 0:28:36 | Steps: 2267 | Loss: 49.807577                                                                                                                                                                                                                                                                                                            
WARNING:tensorflow:From /home/trainer/ds-train/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W1004 07:50:54.227288 140635919415104 deprecation.py:323] From /home/trainer/ds-train/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 1 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 14.869744 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                               
Epoch 1 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 42.427034 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 1 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 60.430801 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 1 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 47.229852 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 1 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 71.610972 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 44.342112 to: /mnt/checkpoints/best_dev-4534
Epoch 2 |   Training | Elapsed Time: 0:28:35 | Steps: 2267 | Loss: 38.781706                                                                                                                                                                                                                                                                                                            
Epoch 2 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 12.907967 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                               
Epoch 2 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 35.656022 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 2 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 55.144742 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 2 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 36.578218 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 2 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 65.437710 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 37.470106 to: /mnt/checkpoints/best_dev-6801
Epoch 3 |   Training | Elapsed Time: 0:28:38 | Steps: 2267 | Loss: 32.290010                                                                                                                                                                                                                                                                                                            
Epoch 3 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 11.469230 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                               
Epoch 3 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 31.850810 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 3 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 52.353987 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 3 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 29.747222 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 3 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 61.468791 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 33.315463 to: /mnt/checkpoints/best_dev-9068
Epoch 4 |   Training | Elapsed Time: 0:28:37 | Steps: 2267 | Loss: 27.813154                                                                                                                                                                                                                                                                                                            
Epoch 4 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 10.661388 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                               
Epoch 4 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 29.489315 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 4 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 49.917231 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 4 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 25.459971 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 4 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 59.274379 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 30.585386 to: /mnt/checkpoints/best_dev-11335
Epoch 5 |   Training | Elapsed Time: 0:01:26 | Steps: 287 | Loss: 9.718229                                                                                                                                                                                                                                                                                                              ^Epoch 5 |   Training | Elapsed Time: 0:28:40 | Steps: 2267 | Loss: 24.443504                                                                                                                                                                                                                                                                                                            
Epoch 5 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 9.920453 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                                
Epoch 5 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 28.171673 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 5 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 49.646353 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 5 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 22.118046 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 5 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 56.128802 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 28.908376 to: /mnt/checkpoints/best_dev-13602
Epoch 6 |   Training | Elapsed Time: 0:28:39 | Steps: 2267 | Loss: 21.769397                                                                                                                                                                                                                                                                                                            
Epoch 6 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 9.537197 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                                
Epoch 6 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 26.599756 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 6 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 48.462345 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 6 | Validation | Elapsed Time: 0:00:46 | Steps: 104 | Loss: 19.238413 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 6 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 54.152451 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 27.191631 to: /mnt/checkpoints/best_dev-15869
Epoch 7 |   Training | Elapsed Time: 0:28:31 | Steps: 2267 | Loss: 19.581926                                                                                                                                                                                                                                                                                                            
Epoch 7 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 9.003750 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                                
Epoch 7 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 25.890564 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 7 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 48.009400 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 7 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 17.315323 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 7 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 52.075734 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 26.149278 to: /mnt/checkpoints/best_dev-18136
Epoch 8 |   Training | Elapsed Time: 0:28:36 | Steps: 2267 | Loss: 17.753123                                                                                                                                                                                                                                                                                                            
Epoch 8 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 8.815179 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                                
Epoch 8 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 24.873896 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 8 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 47.467988 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 8 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 15.427920 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 8 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 51.372372 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 25.106390 to: /mnt/checkpoints/best_dev-20403
Epoch 9 |   Training | Elapsed Time: 0:28:32 | Steps: 2267 | Loss: 16.213704                                                                                                                                                                                                                                                                                                            
Epoch 9 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 8.652601 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                                
Epoch 9 | Validation | Elapsed Time: 0:00:29 | Steps: 91 | Loss: 24.659495 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                    
Epoch 9 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 47.356290 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                           
Epoch 9 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 13.904833 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                 
Epoch 9 | Validation | Elapsed Time: 0:00:01 | Steps: 9 | Loss: 51.444731 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                                
I Saved new best validating model with loss 24.519857 to: /mnt/checkpoints/best_dev-22670
Epoch 10 |   Training | Elapsed Time: 0:28:37 | Steps: 2267 | Loss: 14.880339                                                                                                                                                                                                                                                                                                           
Epoch 10 | Validation | Elapsed Time: 0:00:04 | Steps: 53 | Loss: 8.520405 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_dev.csv                                                                                                                                                                                                                               
Epoch 10 | Validation | Elapsed Time: 0:00:28 | Steps: 91 | Loss: 24.312960 | Dataset: /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_dev.csv                                                                                                                                                                                                                                   
Epoch 10 | Validation | Elapsed Time: 0:00:18 | Steps: 74 | Loss: 46.965495 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv                                                                                                                                                                                                                                                          
Epoch 10 | Validation | Elapsed Time: 0:00:45 | Steps: 104 | Loss: 12.535529 | Dataset: /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_dev.csv                                                                                                                                                                                                                                                
Epoch 10 | Validation | Elapsed Time: 0:00:02 | Steps: 9 | Loss: 51.239443 | Dataset: /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_dev.csv                                                                                                                                                                                               
lissyx commented 5 years ago

@massimo1980 @Mte90 Can we make sure that issues found here are also reported / fixed on my upstream? https://github.com/Common-Voice/commonvoice-fr/tree/master/DeepSpeech

massimo1980 commented 5 years ago

ok training seems to complete without error... The testing phase starts with /mnt/extracted/data/cv-it/clips/test.csv and seems to be good. but.. Testing model on /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv Test epoch | Steps: 0 | Elapsed Time: 0:00:00
Traceback (most recent call last): File "./DeepSpeech.py", line 906, in absl.app.run(main) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "./DeepSpeech.py", line 894, in main test() File "./DeepSpeech.py", line 652, in test evaluate(FLAGS.test_files.split(','), create_model, try_loading) File "/opt/DeepSpeech/evaluate.py", line 152, in evaluate samples.extend(run_test(init_op, dataset=csv)) File "/opt/DeepSpeech/evaluate.py", line 130, in run_test wer, cer, samples = calculate_report(wav_filenames, ground_truths, predictions, losses) File "/opt/DeepSpeech/util/evaluate_tools.py", line 65, in calculate_report samples_wer, samples_cer = wer_cer_batch(samples) File "/opt/DeepSpeech/util/evaluate_tools.py", line 27, in wer_cer_batch wer = sum(s.word_distance for s in samples) / sum(s.word_length for s in samples) ZeroDivisionError: division by zero

lissyx commented 5 years ago

/mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv

Is this file empty ?

massimo1980 commented 5 years ago

Nope.... wav_filename,wav_filesize,transcript /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/chiesa di San Carlo Borromeo.wav,122072,chiesa di san carlo borromeo /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/Casa Rena.wav,72920,casa rena /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/parco del Museo del Tessile.wav,116608,parco del museo del tessile

i hope that the "spaces" inside file names doesn't bother

lissyx commented 5 years ago

i hope that the "spaces" inside file names doesn't bother

I'm pretty sure I have some in mine, so it should be okay.

wav_filename,wav_filesize,transcript /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/chiesa di San Carlo Borromeo.wav,122072,chiesa di san carlo borromeo /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/Casa Rena.wav,72920,casa rena /mnt/extracted/data/lingualibre/lingua_libre/Q385-ita-Italian/Yiyi/parco del Museo del Tessile.wav,116608,parco del museo del tessile

Is that the full content? Only two files?

massimo1980 commented 5 years ago

nope i have pasted only that ......this thread it's just pretty long :smile:

lissyx commented 5 years ago

nope i have pasted only that ......this thread it's just pretty long smile

How many lines does the test CSV contains ? Can you make sure it's more than the test batch size ? Can you ensure there are no empty WAV file and/or empty transcriptions?

massimo1980 commented 5 years ago

How many lines does the test CSV contains ? Can you make sure it's more than the test batch size ?

root@testing:/# cat /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv | wc -l 19 the actual run, if i have understood well, has these parameters: --train_batch_size 68 --dev_batch_size 68 --test_batch_size 68 as i wrote before, i have changed that parameters to 60 to be able to run the training

Can you ensure there are no empty WAV file and/or empty transcriptions?

yep. i mean, doing an ls -al of that folder doesn't show any wav file with 0 size

lissyx commented 5 years ago

root@testing:/# cat /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_test.csv | wc -l 19 the actual run, if i have understood well, has these parameters: --train_batch_size 68 --dev_batch_size 68 --test_batch_size 68 as i wrote before, i have changed that parameters to 60 to be able to run the training

Right, you should try --test_batch_size 10 for example. I had a look at the Lingua Libre dataset in Italian, and it's very small compared to the French one, so it's not surprising.

massimo1980 commented 5 years ago

Right, you should try --test_batch_size 10 for example. I had a look at the Lingua Libre dataset in Italian, and it's very small compared to the French one, so it's not surprising.

yep! seems that decreasing test batch_size to 16 solved the problem. finally training and evaluation finished.

Now seems that, during the tflite model creation, breaks on switch error:

+ python -u DeepSpeech.py --alphabet_config_path /mnt/models/alphabet.txt --lm_binary_path /mnt/lm/lm.binary --lm_trie_path /mnt/lm/trie --feature_cache /mnt/sources/feature_cache --n_hidden 2048 --checkpoint_dir /mnt/checkpoints/ --export_dir /mnt/models/ --export_tflite --nouse_seq_length --export_language it

FATAL Flags parsing error: Unknown command line flag 'nouse_seq_length' Pass --helpshort or --helpfull to see help on flags.

I've looked the list ( DeepSpeech.py --helpfull ) and seems that the switch "nouse_seq_length" doesn't exist.

Mte90 commented 5 years ago

We need to updates the contributing readme and also the docker file so we can avoid those issues in the future.

lissyx commented 5 years ago

FATAL Flags parsing error: Unknown command line flag 'nouse_seq_length' Pass --helpshort or --helpfull to see help on flags.

I've looked the list ( DeepSpeech.py --helpfull ) and seems that the switch "nouse_seq_length" doesn't exist.

Yeah, we removed that argument a while ago. As @Mte90 said, it'd important to stay as close as possible from my repo :)

massimo1980 commented 5 years ago

ok! removing that switch seems to produce the model anyway ( ...anyway i will not use tflite ) Apart to run a "chmod +x run_it.sh", i have no more issues to report. my last question ( i promise i will not bother anymore! :tongue: ):
i can run this model using the "mic_vad_streaming" example of DeepSpeech ? or there is a "quick way"?

lissyx commented 5 years ago

i can run this model using the "mic_vad_streaming" example of DeepSpeech ? or there is a "quick way"?

Yes, we have re-worked that part of the project to ensure it was usable with the latest version, and have some CI running.

You can also just install the C++ client from native_client.tar.xz, or use some of the bindings (Python, NodeJS) and their CLI interface: it's all going to run the same way.

( ...anyway i will not use tflite )

TFLite is required for running on Android and RPi platforms now. It's also giving better / smoother performance as a WebSpeech API implementation.

mone27 commented 5 years ago

I can try to do update the script to solve errors @massimo1980 The thing changed are:

massimo1980 commented 5 years ago

...nevermind :cry: .. no luck

Loaded model in 1.57s. Loading language model from files /mnt/lm/lm.binary /mnt/lm/trie Error: Trie file version mismatch (4 instead of expected 3). Update your trie file. Loaded language model in 0.0333s. Running inference. Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state Errore di segmentazione (core dump creato)

lissyx commented 5 years ago

Loaded model in 1.57s. Loading language model from files /mnt/lm/lm.binary /mnt/lm/trie Error: Trie file version mismatch (4 instead of expected 3). Update your trie file. Loaded language model in 0.0333s. Running inference. Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state Errore di segmentazione (core dump creato)

Have you installed 0.6.0a8 ?

massimo1980 commented 5 years ago

I can try to do update the script to solve errors @massimo1980 The thing changed are:

* path for sed in `generate_alphabet.sh`

* change source to sources in lingualibre

* set train batch size to 10
  Anything else to change?

chmod +x run_it.sh actually seems that this file hasn't execution permission

lissyx commented 5 years ago

https://github.com/mozilla/DeepSpeech/blob/master/examples/mic_vad_streaming/requirements.txt#L1

massimo1980 commented 5 years ago

https://github.com/mozilla/DeepSpeech/blob/master/examples/mic_vad_streaming/requirements.txt#L1

My bad!! i was still at 0.5.1 upgrading did the trick!

but now..

2019-10-04 16:59:03.243818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6783 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) Specified model file version (0) is incompatible with minimum version supported by this client (2).See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information

Traceback (most recent call last): File "/usr/local/bin/deepspeech", line 11, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/deepspeech/client.py", line 80, in main ds = Model(args.model, args.alphabet, BEAM_WIDTH) File "/usr/local/lib/python3.6/dist-packages/deepspeech/__init__.py", line 39, in __init__ raise RuntimeError("CreateModel failed with error code {}".format(status))

RuntimeError: CreateModel failed with error code 8195

lissyx commented 5 years ago

Specified model file version (0) is incompatible with minimum version supported by this client (2)

Can you make sure you trained from matching version ?

lissyx commented 5 years ago

Specified model file version (0) is incompatible with minimum version supported by this client (2)

Can you make sure you trained from matching version ?

The commit referenced is https://github.com/mozilla/DeepSpeech/tree/ba56407376f1e1109be33ac87bcb6eb9709b18be, but you need to make sure you have ran docker build against that commit. Otherwise your tree inside the docker image might be much older.

massimo1980 commented 5 years ago

ok ...i'm deleting everything in /mnt and deleted every docker images/container... i'll restart everything and hope to remake the training without issues :+1:

@mone27 in CONTRIBUTING.md there's: docker build -f Dockerfile.train.it . but actually the file is named "Dockerfile.train"

massimo1980 commented 5 years ago

No luck today... deleted everything and rebuilded .....nothing Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information

commit in container was: ba56407376f1e1109be33ac87bcb6eb9709b18be

@mone27 sorry if i didn't answer....as soon as i can i will test the new updates :+1:

lissyx commented 5 years ago

deleted everything and rebuilded .....nothing Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information

Version is being set at export time, so I don't know what you are doing / how you are testing, but it's somehow wrong.

lissyx commented 5 years ago

commit in container was: ba56407376f1e1109be33ac87bcb6eb9709b18be

Can you double check that this was indeed the tree used ? Can you check the content of GRAPH_VERSION file in the container image ? Can you share full training log including export phase ?

massimo1980 commented 5 years ago

commit in container was: ba56407376f1e1109be33ac87bcb6eb9709b18be

Can you double check that this was indeed the tree used ?

Just checked twice: ba56407376f1e1109be33ac87bcb6eb9709b18be

Can you check the content of GRAPH_VERSION file in the container image ?

Actually i don't know how to access to the container. if i use docker exec bash, it starts automatically the "run.sh" script

Can you share full training log including export phase ?

of course sir! it starts from the "alphabet" generate https://pastebin.com/jFMMCp3j

lissyx commented 5 years ago

Actually i don't know how to access to the container. if i use docker exec bash, it starts automatically the "run.sh" script

Force rebuild the image to be sure.

lissyx commented 5 years ago

it starts from the "alphabet" generate https://pastebin.com/jFMMCp3j

Nothing obvious here. Try using --tty when you docker run

massimo1980 commented 5 years ago

Actually i don't know how to access to the container. if i use docker exec bash, it starts automatically the "run.sh" script

Force rebuild the image to be sure.

Step 3/56 : ARG ds_branch=ba56407376f1e1109be33ac87bcb6eb9709b18be

can't find nothing like "GRAPH_VERSION" only this -> NCCL_VERSION=2.4.8 but i don't think that is what you are looking for....

massimo1980 commented 5 years ago

it starts from the "alphabet" generate https://pastebin.com/jFMMCp3j

Nothing obvious here. Try using --tty when you docker run

https://pastebin.com/7922dAWB

of course there isn't the "training" part cause model files already exists

lissyx commented 5 years ago

Step 3/56 : ARG ds_branch=ba56407376f1e1109be33ac87bcb6eb9709b18be

Ok, but is the checkout actually from that branch.

can't find nothing like "GRAPH_VERSION"

It is a file in the DeepSpeech checkout. This is where we pull the version number.

massimo1980 commented 5 years ago

It is a file in the DeepSpeech checkout. This is where we pull the version number.

cat GRAPH_VERSION 2

lissyx commented 5 years ago

cat GRAPH_VERSION 2

Can you document precisely how you test that then ?

lissyx commented 5 years ago

@massimo1980 Can you also give a try with the french model ? https://github.com/Common-Voice/commonvoice-fr/releases/tag/v0.6.0-fr-0.2

massimo1980 commented 5 years ago

cat GRAPH_VERSION 2

Can you document precisely how you test that then ?

root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# find / -type f -name GRAPH_VERSION /opt/DeepSpeech/GRAPH_VERSION /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# cat /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION 2

lissyx commented 5 years ago

Can you document precisely how you test that then ?

root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# find / -type f -name GRAPH_VERSION /opt/DeepSpeech/GRAPH_VERSION /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# cat /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION 2

No, I mean how you test your model, how you get to

Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
massimo1980 commented 5 years ago

Can you document precisely how you test that then ?

root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# find / -type f -name GRAPH_VERSION /opt/DeepSpeech/GRAPH_VERSION /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION root@testing:/opt/DeepSpeech-Italian-Model/DeepSpeech# cat /var/lib/docker/overlay2/3920aef5747a97ed73ba57ff836581485511a99eaa51f4dd68610f0bd4214ab2/diff/home/trainer/ds/GRAPH_VERSION 2

No, I mean how you test your model, how you get to

i'm sorry i misunderstood! using deepspeech installed via pip:

root@testing:/opt/DeepSpeech# pip freeze | grep deepspeech deepspeech==0.6.0a8 deepspeech-gpu==0.6.0a8

that's the output https://pastebin.com/MUajYYZk

lissyx commented 5 years ago

root@testing:/opt/DeepSpeech# pip freeze | grep deepspeech deepspeech==0.6.0a8 deepspeech-gpu==0.6.0a8

Could you please install either, but not both ?

that's the output https://pastebin.com/MUajYYZk

I have no idea what you are doing wrong here. Please test with my model. It was exported using mostly the same tooling ...

massimo1980 commented 5 years ago

root@testing:/opt/DeepSpeech# pip freeze | grep deepspeech deepspeech==0.6.0a8 deepspeech-gpu==0.6.0a8

Could you please install either, but not both ?

that's the output https://pastebin.com/MUajYYZk

I have no idea what you are doing wrong here. Please test with my model. It was exported using mostly the same tooling ...

Ok now i can confirm that i'm doing nothing wrong :joy: i've modified the script in the container to run:

/mnt/extracted/deepspeech --model /mnt/models/output_graph.pbmm --alphabet /mnt/models/alphabet.txt --lm /mnt/lm/lm.binary --trie /mnt/lm/trie --audio /mnt/galatea_01_barrili_f000040.wav

....and same error

+ export TMP=/mnt/tmp + export TEMP=/mnt/tmp + /mnt/extracted/deepspeech --model /mnt/models/output_graph.pbmm --alphabet /mnt/models/alphabet.txt --lm /mnt/lm/lm.binary --trie /mnt/lm/trie --audio /mnt/galatea_01_barrili_f000040.wav TensorFlow: v1.14.0-16-g3b4ce37 DeepSpeech: v0.6.0-alpha.8-13-g30e0da9 2019-10-04 17:45:50.657784: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information Could not create model.

All the files and dir exists and i'm using the bin compiled directly in the container

Tomorrow, if i can, i will try french model :+1:

lissyx commented 5 years ago

i've modified the script in the container to run:

Can you share your modifications ?

/mnt/extracted/deepspeech

Like, where is this coming from ?

Tomorrow, if i can, i will try french model +1

Can you share your Italian model ?

lissyx commented 5 years ago

Specified model file version (0) is incompatible with minimum version supported by this client (2).

Reading the code, the only good explanation would be that reading the GRAPH_VERSION file silently fails and frozen_graph.version ends up being 0.