facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.18k stars 6.37k forks source link

Question for wav2vec training on Librispeech. #2528

Closed NAM-hj closed 4 years ago

NAM-hj commented 4 years ago

❓ Questions and Help

What is your question?

[How can I train well a new model with CLI tools?]

I tried to train a wav2vec model for the Librispeech. But I couldn't get the well-trained model, because the loss was started at 5.xx but still 2.2x on epoch 47. What should I try to get lower loss?

What have you tried?

python examples/wav2vec/wav2vec_manifest.py ./data/librispeech --dest ./data/prepared_full --ext flac → I use whole files of the Librispeech.

python train.py ./data/prepared_full --save-dir ./model3_small_960h --num-workers 32 --max-update 400000 --save-interval 1 --no-epoch-checkpoints \
--arch wav2vec --task audio_pretraining --lr 1e-06 --min-lr 1e-09 --optimizer adam --max-lr 0.01 --lr-scheduler cosine \
--conv-feature-layers '[(256, 10, 5), (256, 8, 4), (256, 4, 2), (256, 4, 2), (256, 4, 2)]' \
--conv-aggregator-layers '[(256, 3, 1), (256, 3, 1), (256, 3, 1), (256, 3, 1), (256, 3, 1)]' \
--skip-connections-agg --residual-scale 0.5 --log-compression --warmup-updates 20 --warmup-init-lr 1e-04 --criterion wav2vec --num-negatives 10 \
--max-sample-size 150000 --max-tokens 1500000 --skip-invalid-size-inputs-valid-test --tensorboard-logdir ./tensorboard_log

→ The loss log of the above code is presented below. loss_image

(+) Some changes like '256 to 512' and 'large version of wav2vec which described in the paper' are insignificant.

What's your environment?

apex and pyarrow are installed too.


 - Python version: 3.6.9
 - CUDA/cuDNN version: 10.1.243 / 7.6.5
 - GPU models and configuration: 4 * gtx1080ti // Nvidia-driver 440
 - Any other relevant information:
alexeib commented 4 years ago

how optimization proceeds depends on lots of factors: 1) your model architecture 2) your dataset 3) the task you are trying to solve 4) your batch size + lr (i.e. how many gpus you are using)

looking at the graph of your loss, it seems to be working well. its possible that the model has already learned good representations, which you can test by trying to use them for e.g. timit or zerospeech or something like this. otherwise you can try to train with a higher learning rate, use more gpus to train, train for longer, etc