facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.53k stars 6.41k forks source link

AssertionError: Fine-tuning works best when data normalization is the same #2839

Closed alealv closed 4 years ago

alealv commented 4 years ago

ā“ Questions and Help

Hi, I'm trying to follow this tutorial for Fine-Tunning wav2vecv2.0 model.

So far I've created a manifest with: dev.ltr, dev.tsv, dev.wrd, dict.ltr.txt, train.ltr, train.tsv, train.wrd and set the options to (AFAIK) with the correct values.

python train.py /mnt/data/ale/manifest \
--save-dir ~/trainings \
--fp16 \
--wer-args '("/mnt/ale/kenlm-models/openwebtext/v5/4-gram-265M-25-10-2020-pruned-300K.arpa","/mnt/data/lexicon",2,-1)' \
--post-process letter  \
--valid-subset /mnt/data/ale/manifest/dev \
--no-epoch-checkpoints  \
--best-checkpoint-metric wer  \
--num-workers 16 \
--max-update 80000  \
--sentence-avg  \
--task audio_pretraining  \
--arch wav2vec_ctc  \
--w2v-path /mnt/data/ale/models/wav2vec_vox_new.pt \
--labels ltr  \
--apply-mask  \
--mask-selection static  \
--mask-other 0  \
--mask-length 10  \
--mask-prob 0.5  \
--layerdrop 0.1 \
--mask-channel-selection static  \
--mask-channel-other 0  \
--mask-channel-length 64  \
--mask-channel-prob 0.5  \
--zero-infinity \
--feature-grad-mult 0.0  \
--freeze-finetune-updates 10000  \
--validate-after-updates 10000  \
--optimizer adam \
--adam-betas '(0.9, 0.98)'  \
--adam-eps 1e-08  \
--lr 2e-05  \
--lr-scheduler tri_stage  \
--warmup-steps 8000  \
--hold-steps 32000 \
--decay-steps 40000  \
--final-lr-scale 0.05  \
--final-dropout 0.0  \
--dropout 0.0  \
--activation-dropout 0.1  \
--criterion ctc \
--attention-dropout 0.0  \
--max-tokens 1280000  \
--seed 2337  \
--log-format json  \
--log-interval 500  \
--ddp-backend no_c10d 

But I get the following error telling that normalized should be equals to the model's value, which is True

Traceback (most recent call last):
  File "train.py", line 12, in <module>
    main()
  File "train.py", line 8, in main
    cli_main()
  File "/home/aalvarez/Projects/fairseq/fairseq_cli/train.py", line 352, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/home/aalvarez/Projects/fairseq/fairseq/distributed_utils.py", line 296, in call_main
    torch.multiprocessing.spawn(
  File "/home/aalvarez/.virtualenvs/wav2vec-training-XQtg4Z6z-py3.8/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/aalvarez/.virtualenvs/wav2vec-training-XQtg4Z6z-py3.8/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/home/aalvarez/.virtualenvs/wav2vec-training-XQtg4Z6z-py3.8/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/aalvarez/.virtualenvs/wav2vec-training-XQtg4Z6z-py3.8/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/aalvarez/Projects/fairseq/fairseq/distributed_utils.py", line 283, in distributed_main
    main(cfg, **kwargs)
  File "/home/aalvarez/Projects/fairseq/fairseq_cli/train.py", line 74, in main
    model = task.build_model(cfg.model)
  File "/home/aalvarez/Projects/fairseq/fairseq/tasks/audio_pretraining.py", line 185, in build_model
    model = super().build_model(args)
  File "/home/aalvarez/Projects/fairseq/fairseq/tasks/fairseq_task.py", line 548, in build_model
    model = models.build_model(args, self)
  File "/home/aalvarez/Projects/fairseq/fairseq/models/__init__.py", line 56, in build_model
    return ARCH_MODEL_REGISTRY[cfg.arch].build_model(cfg, task)
  File "/home/aalvarez/Projects/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 166, in build_model
    w2v_encoder = Wav2VecEncoder(args, task.target_dictionary)
  File "/home/aalvarez/Projects/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 338, in __init__
    assert (
AssertionError: Fine-tuning works best when data normalization is the same

Although, The registered model Wav2Vec_Ctc doesn't have an argument to set normalize to True.

What's your environment?

alealv commented 4 years ago

I tried with --normalize and it worked šŸ˜’

alexeib commented 4 years ago

its guard rails to make sure data normalization is the same when pre-training and finetuning. newly released vox models were pretrained with pre-normalized data (and no group norm in encoder), whereas the librispeech ones are still using the old encoder that normalizes in the forward pass (and therefore should be finetuned without --normalize)