MozillaItalia / DeepSpeech-Italian-Model

Tooling for producing Italian model (public release available) for DeepSpeech and text corpus
GNU General Public License v3.0
94 stars 20 forks source link

New VoxForge Importer and generation final MITADS-Speech Dataset #130

Closed eziolotta closed 3 years ago

eziolotta commented 3 years ago

1) VoxForge importer with corpora_importer util 2) Important fix other importers 3) Generation of mitads-speech dataset with all importers ( see config file: mitads-speech-full.yaml ) 4) Training test

Total hours we got after corpora_collector process: 349.04 hours

COLLECTED COPRUS----MINUTES------SPEAKERS voxforge-----------------1202.29--------1062 mls-----------------------11274.23-------59 mspka--------------------175.49---------3 m-ailabs------------------7681.88-------208 evalita2009---------------341.77---------? siwis----------------------266.97---------16


TRAINING TEST

I Test deep speech training on this dataset, only 10 epoch with parameter of notebook training example. Median WER: 0.128205

Output:

(ds_train_dev) ubuntu@deepspeech:~/ds_eziolotta$ python DeepSpeech.py --show_progressbar True 
--train_cudnn True 
--alphabet_config_path /home/ubuntu/deep_speech_models/italian_alphabet.txt 
--scorer /home/ubuntu/deep_speech_models/0.8/kenlm.scorer  
--feature_cache /home/ubuntu/deep_speech_models/temp_train/sources/feature_cache   
--train_files ${all_train_csv}   
--dev_files ${all_dev_csv}  
--test_files ${all_test_csv}   
--train_batch_size 64   
--dev_batch_size 64   
--test_batch_size 64   
--n_hidden 2048   
--epochs 10   
--learning_rate 0.0001   
--dropout_rate 0.4   
--max_to_keep 3   
--checkpoint_dir /home/ubuntu/deep_speech_models/ckpts/ita/deepspeech-0.9.3-checkpoint   
--summary_dir /home/ubuntu/deep_speech_models/temp_train/tboard_logs    
--early_stop 
--es_epochs 10    
--automatic_mixed_precision true    
--log_level 1
[.......]
Epoch 8 |   Training | Elapsed Time: 0:31:04 | Steps: 1483 | Loss: 25.444996
Epoch 8 | Validation | Elapsed Time: 0:01:44 | Steps: 219 | Loss: 28.228351 | Dataset: /mitads-speech-dataset/mitads-speech-full_v0.1/dev.csv
I Saved new best validating model with loss 28.228351 to: /home/ubuntu/deep_speech_models/ckpts/ita/deepspeech-0.9.3-checkpoint/best_dev-17783
--------------------------------------------------------------------------------
Epoch 9 |   Training | Elapsed Time: 0:16:20 | Steps: 1117 | Loss: 17.326802                                                                                                                                                                                                         Epoch 9 |   Training | Elapsed Time: 0:31:06 | Steps: 1483 | Loss: 23.981334
Epoch 9 | Validation | Elapsed Time: 0:01:45 | Steps: 219 | Loss: 27.230283 | Dataset: /mitads-speech-dataset/mitads-speech-full_v0.1/dev.csv
I Saved new best validating model with loss 27.230283 to: /home/ubuntu/deep_speech_models/ckpts/ita/deepspeech-0.9.3-checkpoint/best_dev-19266
--------------------------------------------------------------------------------
I FINISHED optimization in 5:30:02.122314
WARNING:tensorflow:From /home/ubuntu/miniconda3/envs/ds_train_dev/lib/python3.7/site-packages/tensorflow_core/contrib/rnn/python/ops/lstm_ops.py:597: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
W0318 16:57:11.119858 140022320514880 deprecation.py:323] From /home/ubuntu/miniconda3/envs/ds_train_dev/lib/python3.7/site-packages/tensorflow_core/contrib/rnn/python/ops/lstm_ops.py:597: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
I Loading best validating checkpoint from /home/ubuntu/deep_speech_models/ckpts/ita/deepspeech-0.9.3-checkpoint/best_dev-19266
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
Testing model on /mitads-speech-dataset/mitads-speech-full_v0.1/test.csv
Test epoch | Steps: 219 | Elapsed Time: 0:26:37
Test on /mitads-speech-dataset/mitads-speech-full_v0.1/test.csv - WER: 0.152582, CER: 0.053754, loss: 27.156124
--------------------------------------------------------------------------------
Best WER:
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.013514, loss: 61.034695
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/voxforge/it-0991-copy-8647.wav
 - src: "si tratta semplicemente di far rotolare attraverso i boschi questi macigni"
 - res: "si tratta semplicemente di far rotolare attraverso i boschi questi macigni "
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.012346, loss: 47.484196
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/voxforge/it-0775-copy-8726.wav
 - src: "non saprei rispose questi lanciando uno sguardo inquieto verso gli alberi giganti"
 - res: "non saprei rispose questi lanciando uno sguardo inquieto verso gli alberi giganti "
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.018182, loss: 45.007946
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/voxforge/it-1065-copy-9270.wav
 - src: "e voi credete cavaliere che egli possa sospettare di me"
 - res: "e voi credete cavaliere che egli possa sospettare di me "
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.018519, loss: 44.839371
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/siwis/IT_B_32_312.wav
 - src: "in breve si tratta di un problema di grande importanza"
 - res: "in breve si tratta di un problema di grande importanza "
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.008621, loss: 41.642883
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/siwis/IT_C_36_196.wav
 - src: "la sospensione delle vendite attuata da una catena di supermercati non sono notizie da passare all'opinione pubblica"
 - res: "la sospensione delle vendite attuata da una catena di supermercati non sono notizie da passare all'opinione pubblica "
--------------------------------------------------------------------------------
Median WER:
--------------------------------------------------------------------------------
WER: 0.128205, CER: 0.021930, loss: 27.179636
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/mls/2033_1596_001617.wav
 - src: "dalla platea e dalle gallerie i ragazzi applaudivano ogni volta che passava uno molto piccolo o uno che dai vestiti paresse povero e anche quelli che avevano delle gran capigliature ricciolute o eran vestiti di rosso o di bianco"
 - res: "dalla platea e dalle gallerie i ragazzi applaudivano ogni volta che passava uno molto piccolo o uno che dai vestiti paresse povero e anche quelli che avevano delle gran capigliature riccioluta o era un vestito di rosso e di bianco"
--------------------------------------------------------------------------------
WER: 0.128205, CER: 0.037037, loss: 23.368874
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/mls/6348_5862_000104.wav
 - src: "manco male che due portinaj in via volturno uno in via gaeta un altro in via palestro gli eran rimasti fedeli e lo aspettavano le altre copie doveva venderle cosí alla ventura girando per tutto il quartiere del macao"
 - res: "manco male che due portinai via volturno uno in via gaeta un altro via palestro gli erano rimasti fedeli e lo aspettavano le altre copie doveva venderle così alla ventura girando per tutto il quartiere del macao"
--------------------------------------------------------------------------------
WER: 0.128205, CER: 0.031390, loss: 19.807188
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/mls/8828_8610_000150.wav
 - src: "barch cioè vedi lo fai dire anche a me i dia due paja di bacchette e dàlli calosce per queste bambine le chiama barchette la mia piccina veramente si potrebbero anche chiamare cosí per non usare quella parolaccia forestiera"
 - res: "perche cioè vedi lo fai dire anche a me mi dia due paia di bacchette e dalli calosce per queste bambine le chiama barchette la mia piccina veramente si potrebbero anche chiamare così per non usare quella parolaccia forestiera"
--------------------------------------------------------------------------------
WER: 0.128205, CER: 0.018100, loss: 13.439837
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/mls/8828_8610_000109.wav
 - src: "e subito tutte le membra le si rilassarono cosí che non poté neanche sollevare le gracili mani per nascondersi il volto ma la vecchia mamma le si accostò e posandole lievemente una mano sulla spalla figlia mia le annunziò"
 - res: "e subito tutte le membra le si rilassarono così che non pote neanche sollevare le gracili mani per nascondersi il volto ma la vecchia mamma le si accostò e posando le lievemente una mano sulla spalla figlia mia e annunziò"
--------------------------------------------------------------------------------
WER: 0.129032, CER: 0.069444, loss: 90.858200
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/mls/4975_4125_000201.wav
 - src: "sospesi consci dell'orribile impressione che sua eccellenza destava in tutta la cittadinanza e infatti parve a tutti che il cielo il gajo aspetto della nostra bianca cittadina s'oscurassero a quell'apparizione ispida"
 - res: "consci dell'orribile impressione che sua eccellenza destava in tutta la cittadinanza e infatti parve a tutti che il cielo il grassetto della nostra bianca cittadina oscurassero a quell'apparizione ispida"
--------------------------------------------------------------------------------
Worst WER:
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.333333, loss: 8.697345
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/evalita2009/clean00866.wav
 - src: "due"
 - res: "e a "
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.500000, loss: 8.080779
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/evalita2009/clean02285.wav
 - src: "nove"
 - res: "no e "
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.500000, loss: 6.972440
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/evalita2009/clean02275.wav
 - src: "nove"
 - res: "no e "
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.500000, loss: 5.985777
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/evalita2009/clean02975.wav
 - src: "nove"
 - res: "no ve "
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.750000, loss: 5.125335
 - wav: file:///mitads-speech-dataset/mitads-speech-full_v0.1/audios/evalita2009/clean02895.wav
 - src: "nove"
 - res: "no me "
--------------------------------------------------------------------------------
Mte90 commented 3 years ago

@eziolotta can I approve it?

eziolotta commented 3 years ago

yes ..thanks, code was tested on server unito, using commands described in notebook import_speech_dataset.ipynb