Open sugeeth14 opened 3 years ago
@kahne
The first error you got above suggests that fairseq is not finding config_asr..yaml
. When I train ASR or ST, I do not provide a path to the config_asr.yaml
, just the file name itself. I think fairseq prepends the data directory path to the config file path, and silently ignores the config file if it cannot find it.
I am training successfully with commit 1c0439b7da
@bhaddow Thanks a lot changing to the commit https://github.com/pytorch/fairseq/commit/1c0439b7dabe62d39c6e7f1c8ebc86311e042b5a has helped now able to train.
@kahne I would like to contribute if this is an issue any possible place to look into on why this error is occuring ?
The first error you got above suggests that fairseq is not finding
config_asr..yaml
. When I train ASR or ST, I do not provide a path to theconfig_asr.yaml
, just the file name itself. I think fairseq prepends the data directory path to the config file path, and silently ignores the config file if it cannot find it.I am training successfully with commit 1c0439b
hi @bhaddow and @sugeeth14 , I also meet the same issue as @sugeeth14 . By following your suggestion, I check fairseq/models/fairseq_model.py and it is the same as commit 1c0439b . But it still shows the same error messages.
In addition, I noticed that this issue were opened on 7 April, the date after 1c0439b was committed on 2 Mar. I am a bit confused that how did this commit solve this issue as it had been committed even earlier.\
Thanks both of you in advance!
hi, @sn1ff1918. I meet the same issue too, and check this file, it had fixed already. do you solve this problem?
Hi I am trying to train a new ASR model by following the steps available here
I downloaded MUST-C version 2.0 data availabe here
Unzipping the tar file gives a folder titled
en-de
which has the following contents two foldersdata
anddoc
data: dev train tst-COMMON tst-HE
Then I preprocessed the data using the following command
python fairseq/examples/speech_to_text/prep_mustc_data.py --data-root mustcv2/ --task asr --vocab-type unigram --vocab-size 5000
the preprocessing ran successfully populating the
en-de
folder as belowThen I tried to train the model using the command
fairseq-train mustcv2/en-de/ --config-yaml mustcv2/en-de/config_asr.yaml --train-subset train_asr --valid-subset dev_asr --save-dir checkpoints/asr/ --num-workers 4 --max-tokens 40000 --max-update 100000 --task speech_to_text --criterion label_smoothed_cross_entropy --report-accuracy --arch s2t_transformer_s --optimizer adam --lr 1e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8
Which led to error saying
dict.txt
was not present.From my previous experinece of using
fairseq
I copied thespm_unigram5000_asr.txt
todict.txt
and ran the training command again. For which I am getting the below error.The
encoder_freezing_updates
is being set toNull
hence I changed the code in s2t_transformer.py: line 337
to below and ran the training command again,
for which I am getting the below error.
printing the
encoder_out["encoder_padding_mask"]
in transformer.py shows empty list being passed the second time the function is called as seen below.The same issue is occuring with
speech_translation
Please let me know if I am doing anything wrong here I am using the latest
fairseq
just cloned now frommaster
branch. Torch version is1.8.1+cu102
and usingUbuntu 20.04
My apologies if it not a bug. Please let me know how I can train the same.
Thanks.