facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.05k stars 6.35k forks source link

Unrecognized arguments when attempting to train FastSpeech2 model #4015

Open eonglints opened 2 years ago

eonglints commented 2 years ago

🐛 Bug

Unrecognized arguments when attempting to train FastSpeech 2 model (see https://github.com/pytorch/fairseq/blob/main/examples/speech_synthesis/docs/ljspeech_example.md#fastspeech2)

To Reproduce

Steps to reproduce the behavior:

  1. Run cmd:
    fairseq-train ${FEATURE_MANIFEST_ROOT} --save-dir ${SAVE_DIR} \
    --config-yaml config.yaml --train-subset train --valid-subset dev \
    --num-workers 4 --max-sentences 6 --max-update 200000 \
    --task text_to_speech --criterion fastspeech2 --arch fastspeech2 \
    --clip-norm 5.0 --n-frames-per-step 1 \
    --dropout 0.1 --attention-dropout 0.1 --activation-dropout 0.1 \
    --encoder-normalize-before --decoder-normalize-before \
    --optimizer adam --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --seed 1 --update-freq 8 --eval-inference --best-checkpoint-metric mcd_loss
  2. See: fairseq-train: error: unrecognized arguments: --activation-dropout 0.1 --encoder-normalize-before --decoder-normalize-before

Expected behavior

Arguments are accepted and model starts to train.

Environment

Additional context

Still very much enjoying this framework despite the odd bug. Thanks again!

eonglints commented 2 years ago

Looking at https://github.com/pytorch/fairseq/blob/main/fairseq/models/text_to_speech/fastspeech2.py it would appear that this particular bug is just a documentation issue - this model doesn't need these three arguments.

However, after removing those arguments, the model still doesn't train as it's looking for other missing args such as args.pitch_min. Would be good to have a proper set of args for training this model.

Those arguments still need to be removed from the documentation but the issue about the missing FastSpeech2 args is my mistake - I hadn't added the --add-fastspeech-targets argument when creating the feature manifests.