Segmentation fault when training speech_to_text model following instruction in examples/speech_to_text

zxshamson commented 3 years ago

🐛 Bug

I have followed the README in examples/speech_to_text to reproduce results of ST on MUSTC. But when I start training (after preprocess following the instruction), the system raises segmentation fault error, just after reading dev subset.

Below is part of messages:

2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | dictionary size (spm_bpe10000_st.txt): 10,000
2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | pre-tokenizer: {'tokenizer': None}
2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | tokenizer: {'bpe': 'sentencepiece', 'sentencepiece_model': '/home/ma-user/work/data/mustc-s2t/en-de/spm_bpe10000_st.model'}
2020-10-23 18:09:18 | INFO | fairseq.data.audio.speech_to_text_dataset | SpeechToTextDataset(split="valid_st", n_samples=1388, prepend_tgt_lang_tag=False, shuffle=False, transforms=None)
mustc-test-s2t-cd.sh: line 11: 33140 Segmentation fault      CUDA_VISIBLE_DEVICES=0 python fairseq_cli/train.py ${data_dir} --config-yaml config_st.yaml --train-subset train_st --valid-subset valid_st --save-dir ${model_dir} --num-workers 1 --max-tokens 20000 --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-update 100000 --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1

To Reproduce

CUDA_VISIBLE_DEVICES=0 python fairseq_cli/train.py ${data_dir} --config-yaml config_st.yaml --train-subset train_st --valid-subset valid_st --save-dir ${model_dir} --num-workers 1 --max-tokens 20000 --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-update 100000 --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1

Code sample

Expected behavior

Environment

fairseq Version (e.g., 1.0 or master): 1.0
PyTorch Version (e.g., 1.0): 1.4.0
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source):
Python version: 3.7
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

kahne commented 3 years ago

Hi @zxshamson , I cannot reproduce this error. Do you mind sharing more info for diagnosis?

zxshamson commented 3 years ago

Hi @kahne , I think the problem comes from the speech_to_text dataset since the segmentation fault appears just after we initialize the validation set. Would you mind listing what packages will be used during building the datasets, including the package versions? So that I can check with them.

zjw1990 commented 3 years ago

Same Error. To reproduce: fairseq Version: 0.9.0 PyTorch Version (e.g., 1.0): 1.4.0 OS (e.g., Linux): Linux How you installed fairseq (pip, source): source Python version: 3.7 CUDA/cuDNN version:10.1

zjw1990 commented 3 years ago

This is my local toy version that without any errors: fairseq Version: build from source PyTorch Version (e.g., 1.0): 1.6.0 OS (e.g., Linux): Ubuntu 18.04 How you installed fairseq (pip, source): source Python version: 3.8.6 CUDA/cuDNN version:10.1 torchaudio: 0.6.0 pip: 20.2.4

So I think it is because of the pytorch version?

zxshamson commented 3 years ago

This is my local toy version that without any errors: fairseq Version: build from source PyTorch Version (e.g., 1.0): 1.6.0 OS (e.g., Linux): Ubuntu 18.04 How you installed fairseq (pip, source): source Python version: 3.8.6 CUDA/cuDNN version:10.1 torchaudio: 0.6.0 pip: 20.2.4

So I think it is because of the pytorch version?

Thanks for your information! I've tried 1.6.0 pytorch version, it seems work!

zjw1990 commented 3 years ago

Thanks! Could you please check why? I still need to play it on pytorch 1.4 because our server only support pytorch1.4 :)

zxshamson commented 3 years ago

Sorry! I have worked on it for sometime but still cannot find why. Maybe we should wait for the author's response.

kahne commented 3 years ago

Hi @zjw1990 @zxshamson , thanks for the debugging efforts. I am looking into this backward compatibility issue. To gather more info, I wonder if you guys tried PyTorch 1.5?

zxshamson commented 3 years ago

@kahne I have just tried Pytorch 1.5, and it also works well.

zxshamson commented 3 years ago

@kahne Also I have another question. I have tried the recommended command to train a model with MUSTC en-de dataset, and only achieved 17.4 BLEU in test set (after 50 epochs), much lower than the reported result (22.7). Is my current result alright? Or the reported result is achieved with ASR pre-training? Thank you and look forward to your reply!

kahne commented 3 years ago

@zxshamson Yes, 22.7 is from the model pre-trained with ASR encoder. Please refer to the documentation for ASR training and loading pre-trained encoder (--load-pretrained-encoder-from ${CHECKPOINT_PATH}).

zxshamson commented 3 years ago

@kahne Thanks! I'll try later. One more minor question, do we need to adjust the hyperparameters when training the model with pre-trained ASR encoder?

kahne commented 3 years ago

Just follow the hyper-parameter settings in the documentation (you may want to adjust --update-freq when using fewer GPUs or smaller --max-token). Compared to models trained from scratch, the pre-trained model converges much faster (~30k updates will be enough).

zxshamson commented 3 years ago

Thanks a lot!

zjw1990 commented 3 years ago

Hi, @zxshamson could you please tell me how many epochs have you run to achive the final stop? How long have to taken? I run a whole day and only 15 epochs have been run, and loss now is 4.532. Also I run the S2T baseline without any pretraining. ! Thanks!

2020-11-04 07:05:41 | INFO | train_inner | epoch 015: 1402 / 3257 loss=4.532, nll_loss=4.532, total=2385, n_correct=942.98, ppl=23.13, accuracy=39.538, wps=1426.2, ups=0.6, wpb=2385, bsz=79.9, num_updates=47000, lr=0.000922531, gnorm=0.593, clip=0, train_wall=167, wall=76565

zxshamson commented 3 years ago

Hi @zjw1990 , do you run on multilingual setting? The update num per epoch is very different from mine. Anyway, the following is my final training step (I forced the max epoch to be 50, but it seems that it still not converges), and it took about 6 hours to achieve that.

2020-11-03 00:02:23 | INFO | train_inner | epoch 050: 441 / 491 loss=4.705, nll_loss=3.309, total=12166.1, n_correct=6568.26, ppl=9.91, accuracy=53.988, wps=14074.9, ups=1.16, wpb=12166.1, bsz=470, num_updates=24500, lr=0.00127775, gnorm=0.258, clip=0, train_wall=86, wall=21897

zjw1990 commented 3 years ago

@zxshamson Thanks! No I am runing the Berard_s2t baseline model on MUST-C en-fr. It seems a whole day passed and only 16 epochs for me, I only used one GPU. Shall I control any parameters below to make it faster? Also I didnt insall apex. Does it works? How much time would it save? Thank you!

fairseq-train ${MUSTC_ROOT}/en-fr --config-yaml ${MUSTC_ROOT}/en-fr/config_st.yaml --train-subset train_st --valid-subset dev_st --save-dir ${ST_SAVE_DIR} \ --num-workers 4 --max-tokens 8000 --task speech_to_text --criterion label_smoothed_cross_entropy \ --report-accuracy --max-update 30000 --arch s2t_berard --optimizer adam --lr 2e-3 \ --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8 \

zxshamson commented 3 years ago

@zjw1990 Hi, I haven't tried the Berard s2t baseline, so I am not sure whether it goes well with your situation. Maybe you can try --fp16 to make training faster.

kahne commented 3 years ago

Hi @zjw1990, as @zxshamson suggested, you can leverage --fp16 and apex if your GPU has tensor cores (hopefully you can get 2x speedup). Also, I recommend you to try with Transformer since Berard model is RNN-based and is slower to train. We will open-source model checkpoints soon so that you can use them directly or at least use ASR ones to pre-train and speed up ST training.

zjw1990 commented 3 years ago

@zxshamson Thanks, maybe I should try transformers2t model instead. RNN is really slow... @kahne Thanks, I will check with the GPU. If checkpointa are open for ST would be great!

holyma commented 2 years ago

I have run en-de ASR in mustc with 1 GPU, as this instructions. But it still not converges, follow is my result:

| INFO | train | epoch 160 | loss 4.577 | nll_loss 3.24 | total 12454.7 | n_correct 6609.75 | ppl 9.45 | accuracy 53.07 | wps 4776.8 | ups 0.38 | wpb 12454.7 | bsz 455 | num_updates 78241 | lr 0.000357506 | gnorm 0.394 | clip 0 | train_wall 1201 | gb_free 13.3 | wall 40551

It is normal？@kahne @zxshamson

kahne commented 2 years ago

I have run en-de ASR in mustc with 1 GPU, as this instructions. But it still not converges, follow is my result:
| INFO | train | epoch 160 | loss 4.577 | nll_loss 3.24 | total 12454.7 | n_correct 6609.75 | ppl 9.45 | accuracy 53.07 | wps 4776.8 | ups 0.38 | wpb 12454.7 | bsz 455 | num_updates 78241 | lr 0.000357506 | gnorm 0.394 | clip 0 | train_wall 1201 | gb_free 13.3 | wall 40551
It is normal？@kahne @zxshamson

Did you set --update-freq to 8?

holyma commented 2 years ago

I have run

I have run en-de ASR in mustc with 1 GPU, as this instructions. But it still not converges, follow is my result:
| INFO | train | epoch 160 | loss 4.577 | nll_loss 3.24 | total 12454.7 | n_correct 6609.75 | ppl 9.45 | accuracy 53.07 | wps 4776.8 | ups 0.38 | wpb 12454.7 | bsz 455 | num_updates 78241 | lr 0.000357506 | gnorm 0.394 | clip 0 | train_wall 1201 | gb_free 13.3 | wall 40551
It is normal？@kahne @zxshamson
Did you set --update-freq to 8?

yes, this is my instruction:

fairseq-train ${MUSTC_ROOT}/en-de \
  --config-yaml config_asr.yaml --train-subset train_asr --valid-subset dev_asr \
  --save-dir ${CHECKPOINT}/mustc_asr --num-workers 4 --max-tokens 40000 --max-update 100000 \
  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy --fp16\
  --arch s2t_transformer_s --optimizer adam --lr 1e-3 --lr-scheduler inverse_sqrt \
  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8  --tensorboard-logdir $LOG/mustc_asr | tee  $LOG/mustc_asr/train_7.log

I have downloaded your pretrained model, it runs well.

facebookresearch / fairseq