Unable to train a ASR/ST model on MUST-C data.

sugeeth14 commented 3 years ago

Hi I am trying to train a new ASR model by following the steps available here

I downloaded MUST-C version 2.0 data availabe here

Unzipping the tar file gives a folder titled en-de which has the following contents two folders data and doc

data: dev train tst-COMMON tst-HE

Then I preprocessed the data using the following command

python fairseq/examples/speech_to_text/prep_mustc_data.py --data-root mustcv2/ --task asr --vocab-type unigram --vocab-size 5000

the mustcv2 folder has the en-de folder which was unzipped earlier.

the preprocessing ran successfully populating the en-de folder as below

Then I tried to train the model using the command

fairseq-train mustcv2/en-de/ --config-yaml mustcv2/en-de/config_asr.yaml --train-subset train_asr --valid-subset dev_asr --save-dir checkpoints/asr/ --num-workers 4 --max-tokens 40000 --max-update 100000 --task speech_to_text --criterion label_smoothed_cross_entropy --report-accuracy --arch s2t_transformer_s --optimizer adam --lr 1e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8

Which led to error saying dict.txt was not present.

From my previous experinece of using fairseq I copied the spm_unigram5000_asr.txt to dict.txt and ran the training command again. For which I am getting the below error.

Traceback (most recent call last):                                                                                                                                     
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq_cli/train.py", line 491, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq_cli/train.py", line 169, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq_cli/train.py", line 279, in train
    log_output = trainer.train_step(samples)
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/trainer.py", line 668, in train_step
    ignore_grad=is_dummy_batch,
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/tasks/fairseq_task.py", line 475, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 79, in forward
    net_output = model(**sample["net_input"])
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 268, in forward
    encoder_out = self.encoder(src_tokens=src_tokens, src_lengths=src_lengths)
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 337, in forward
    if self.num_updates < self.encoder_freezing_updates:
TypeError: '<' not supported between instances of 'int' and 'NoneType'

The encoder_freezing_updates is being set to Null

hence I changed the code in s2t_transformer.py: line 337

to below and ran the training command again,

for which I am getting the below error.

Traceback (most recent call last):                                                                                                                                     
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq_cli/train.py", line 491, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq_cli/train.py", line 169, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq_cli/train.py", line 279, in train
    log_output = trainer.train_step(samples)
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/trainer.py", line 668, in train_step
    ignore_grad=is_dummy_batch,
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/tasks/fairseq_task.py", line 475, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 79, in forward
    net_output = model(**sample["net_input"])
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 270, in forward
    prev_output_tokens=prev_output_tokens, encoder_out=encoder_out
  File "/fastdisk/Sugeeth/miniconda3/envs/offline/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/models/transformer.py", line 823, in forward
    alignment_heads=alignment_heads,
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 396, in extract_features
    alignment_heads,
  File "/fastdisk/Sugeeth/offline/fairseq/fairseq/models/transformer.py", line 890, in extract_features_scriptable
    padding_mask = encoder_out["encoder_padding_mask"][0]
IndexError: list index out of range

printing the encoder_out["encoder_padding_mask"] in transformer.py shows empty list being passed the second time the function is called as seen below.

The same issue is occuring with speech_translation

Please let me know if I am doing anything wrong here I am using the latest fairseq just cloned now from master branch. Torch version is 1.8.1+cu102 and using Ubuntu 20.04

My apologies if it not a bug. Please let me know how I can train the same.

Thanks.

sugeeth14 commented 3 years ago

@kahne

bhaddow commented 3 years ago

The first error you got above suggests that fairseq is not finding config_asr..yaml . When I train ASR or ST, I do not provide a path to the config_asr.yaml , just the file name itself. I think fairseq prepends the data directory path to the config file path, and silently ignores the config file if it cannot find it.

I am training successfully with commit 1c0439b7da

sugeeth14 commented 3 years ago

@bhaddow Thanks a lot changing to the commit https://github.com/pytorch/fairseq/commit/1c0439b7dabe62d39c6e7f1c8ebc86311e042b5a has helped now able to train.

sugeeth14 commented 3 years ago

@kahne I would like to contribute if this is an issue any possible place to look into on why this error is occuring ?

Liu-Tianchi commented 3 years ago

The first error you got above suggests that fairseq is not finding config_asr..yaml . When I train ASR or ST, I do not provide a path to the config_asr.yaml , just the file name itself. I think fairseq prepends the data directory path to the config file path, and silently ignores the config file if it cannot find it.

I am training successfully with commit 1c0439b

hi @bhaddow and @sugeeth14 , I also meet the same issue as @sugeeth14 . By following your suggestion, I check fairseq/models/fairseq_model.py and it is the same as commit 1c0439b . But it still shows the same error messages.

In addition, I noticed that this issue were opened on 7 April, the date after 1c0439b was committed on 2 Mar. I am a bit confused that how did this commit solve this issue as it had been committed even earlier.\

Thanks both of you in advance!

cywan1998 commented 3 years ago

hi, @sn1ff1918. I meet the same issue too, and check this file, it had fixed already. do you solve this problem?

holyma commented 3 years ago

@sugeeth14 @Atla11nTa Have you train ASR/ST on mustc data and got the published results？

facebookresearch / fairseq

Unable to train a ASR/ST model on MUST-C data. #3457