NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

Prediction ERROR #659

Closed 943274923 closed 3 years ago

943274923 commented 4 years ago

I train a quartznet model in librispeech ,but Prediction ERROR ,what should I do? I use command: python -m torch.distributed.launch --nproc_per_node=4 /data/NeMo/examples/asr/quartznet.py --batch_size=10 --num_epochs=400 --lr=0.015 --warmup_steps=8000 --weight_decay=0.001 --train_dataset=data/train_all.json --eval_datasets data/dev_clean.json data/dev_other.json --model_config=/data/NeMo/examples/asr/configs/quartznet15x5.yaml --exp_name=librispeech train log is:

[NeMo I 2020-05-22 04:54:23 helpers:72] Loss: 464.1666564941406
[NeMo I 2020-05-22 04:54:23 helpers:73] training_batch_WER:  110.09%
[NeMo I 2020-05-22 04:54:23 helpers:74] Prediction: a e o e o o e o o e a o e a on e o e a on e o e o o e o on e o e o
[NeMo I 2020-05-22 04:54:23 helpers:75] Reference: i entreat you to gather up your courage i assure you that these wretched people are not unkind misery not unlike that which you yourself have endured has made them what they are no doubt we should have arranged for a better place for you wherein to await your friends
[NeMo I 2020-05-22 04:54:23 callbacks:239] Step time: 0.49793529510498047 seconds
[NeMo I 2020-05-22 04:54:35 callbacks:224] Step: 120250
[NeMo I 2020-05-22 04:54:35 helpers:72] Loss: 508.3443908691406
[NeMo I 2020-05-22 04:54:35 helpers:73] training_batch_WER:  100.85%
[NeMo I 2020-05-22 04:54:35 helpers:74] Prediction: a e a ee a e an e a an e a an e a an e  e a an e e a e a an a  o an e o e a
[NeMo I 2020-05-22 04:54:35 helpers:75] Reference: a village is by much too narrow a sphere for him even an ordinary market town is scarce large enough to afford him constant occupation in the lone houses and very small villages which are scattered about in so desert a country as the highlands of scotland
okuchaiev commented 4 years ago

(0) was your training loss (and WER) increasing initially and then jumped up? (1) what training data do you use? (2) batch size of 10 per GPU looks small to me. Can you increase batch size and decrease learning rate?