torch cannot read / load the model from wav2vec 2.0 pre-trainned model.

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.66k stars 6.43k forks source link

❓ Questions and Help

I'm have problem in fine tuning the command that i use base on fine-tuning command README.md

python3 train.py '/home/bram/Documents/coding/speech/traindata/text_label' \
--save-dir '/home/bram/Documents/coding/speech/traindata/model_finetuning_wav2vec' --fp16 
--wer-args '("/home/bram/Documents/coding/speech/traindata/text_label/lm.bin","/home/bram/Documents/coding/speech/traindata/text_label/lexicon.txt",2,-1)' \
--post-process letter --valid-subset valid --no-epoch-checkpoints --best-checkpoint-metric wer --num-workers 128 \
--max-update 400000 --sentence-avg --task audio_pretraining --arch wav2vec_ctc \
--w2v-path '/home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/checkpoint_best.pt' \
--labels ltr --apply-mask --mask-selection static --mask-other 0 --mask-length 10 --mask-prob 0.5 --layerdrop 0.1 \
--mask-channel-selection static --mask-channel-other 0 --mask-channel-length 64 --mask-channel-prob 0.5 --zero-infinity \
--feature-grad-mult 0.0 --freeze-finetune-updates 10000 --validate-after-updates 10000 --optimizer adam \
--adam-betas '(0.9, 0.98)' --adam-eps 1e-08 --lr 2e-05 --lr-scheduler tri_stage --warmup-steps 8000 --hold-steps 32000 \
--decay-steps 40000 --final-lr-scale 0.05 --final-dropout 0.0 --dropout 0.0 --activation-dropout 0.1 --criterion ctc \
--attention-dropout 0.0 --max-tokens 1280000 --seed 2337 --log-format json --log-interval 500 --ddp-backend no_c10d

the error report:

File "/home/bram/Documents/coding/Speech/fairseq/train.py", line 14, in <module>
    cli_main()
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq_cli/train.py", line 345, in cli_main
    distributed_utils.call_main(args, main)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/distributed_utils.py", line 268, in call_main
    main(args, **kwargs)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq_cli/train.py", line 61, in main
    model = task.build_model(args)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/tasks/fairseq_task.py", line 546, in build_model
    model = models.build_model(args, self)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/models/__init__.py", line 57, in build_model
    return ARCH_MODEL_REGISTRY[model_cfg.arch].build_model(model_cfg, task)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 168, in build_model
    w2v_encoder = Wav2VecEncoder(args, task.target_dictionary)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 331, in __init__
    args.w2v_path, arg_overrides
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/checkpoint_utils.py", line 211, in load_checkpoint_to_cpu
    setattr(args, arg_name, arg_val)
AttributeError: 'NoneType' object has no attribute 'dropout'

the folder /home/bram/Documents/coding/speech/traindata/text_label contains:

1. dict.ltr.txt
2. lexicon.txt
3. lm.bin
4. train.tsv
5. train.wrd
6. train.ltr
7. valid.tsv
8. valid.wrd
9. valid.ltr

the folder /home/bram/Documents/coding/speech/traindata/model_finetuning_wav2vec is empty means to save the finetuning model that resulting in finetuning process

the folder /home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/ is contained:

1. checkpoint_best.pt
2. checkpoint_last.pt

this file resulting from pre-trainned process from own dataset..command of pre-trained:

python3 /content/repo/fairseq/train.py '/content/drive/My Drive/wav_manifest/' \
--save-dir '/content/drive/My Drive/wav2vec_v2_pre_train_model'  --fp16 \
--num-workers 128 --task audio_pretraining --criterion wav2vec --arch wav2vec2 \
--log-keys '["prob_perplexity","code_perplexity","temp"]' --quantize-targets \
--extractor-mode default --conv-feature-layers '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2' --final-dim 256 \
--latent-vars 320 --latent-groups 2 --latent-temp '(2,0.5,0.999995)' --infonce \
--optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-06 --lr-scheduler polynomial_decay \
--total-num-update 400000 --lr 0.0005 --warmup-updates 32000 --mask-length 10 --mask-prob 0.65 \
--mask-selection static --mask-other 0 --encoder-layerdrop 0.05 --dropout-input 0.1 --dropout-features 0.1 \
--feature-grad-mult 0.1 --loss-weights '[0.1, 10]' --conv-pos 128 --conv-pos-groups 16 --num-negatives 100 \
--cross-sample-negatives 0 --max-sample-size 1500000 --no-epoch-checkpoints --min-sample-size 2000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 --max-tokens 1400000 --max-update 400000 \
--skip-invalid-size-inputs-valid-test --ddp-backend no_c10d

train in google colab...

I already try to debugging, by folllow the process, step by step, and found where the error happen, but cannot solve the problem.

the error happen in File fairseq/fairseq/checkpoint_utils.py, line 211, in load_checkpoint_to_cpu function.

I try to reproduce the step. here the report:

line 201 def load_checkpoint_to_cpu(path, arg_overrides=None): this function call by the /fairseq/fairseq/models/wav2vec/wav2vec2_asr.py line 330. before it, there is command to reproduce the arg_overrides variable. which the arg_overrides variable now is:

arg_overrides = {'dropout': 0.0,
                 'activation_dropout': 0.1,
                 'dropout_input': 0,
                 'attention_dropout': 0.0,
                 'mask_length': 10,
                 'mask_prob': 0.5,
                 'mask_selection': 'static',
                 'mask_other': 0.0,
                 'no_mask_overlap': False,
                 'mask_channel_length': 64,
                 'mask_channel_prob': 0.5,
                 'mask_channel_selection': 'static',
                 'mask_channel_other': 0.0,
                 'no_mask_channel_overlap': False,
                 'encoder_layerdrop': 0.1,
                 'feature_grad_mult': 0.0}

and the path is args.w2v_path that is '/home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/checkpoint_best.pt' in come because we set option --w2v-path.

so the function def load_checkpoint_to_cpu(path, arg_overrides=None): define variable path = '/home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/checkpoint_best.pt'

ok, then... with open(PathManager.get_local_path(path), "rb") as f: in line 203 of file fairseq/fairseq/checkpoint_utils.py means call to read the file checkpoint_best.pt and define it as f variable.

error happens when executing: state = torch.load(f, map_location=lambda s, l: default_restore_location(s, "cpu")) in line 204 file fairseq/fairseq/checkpoint_utils.py.

the error report said 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte meaning that torch cannot load /read the checkpoint_best.pt.

any suggestion, can somebody help? if I can fix this I will continues the tutorial.

python3 /content/repo/fairseq/train.py '/content/drive/My Drive/wav_manifest/' \ --save-dir '/content/drive/My Drive/wav2vec_v2_pre_train_model' \ --fp16 --num-workers 128 --task audio_pretraining --criterion wav2vec --arch wav2vec2 \ --log-keys '["prob_perplexity","code_perplexity","temp"]' --quantize-targets \ --extractor-mode default --conv-feature-layers '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2' \ --final-dim 256 --latent-vars 320 --latent-groups 2 --latent-temp '(2,0.5,0.999995)' --infonce \ --optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-06 --lr-scheduler polynomial_decay \ --total-num-update 400000 --lr 0.0005 --warmup-updates 32000 --mask-length 10 \ --mask-prob 0.65 --mask-selection static --mask-other 0 --encoder-layerdrop 0.05 --dropout-input 0.1 \ --dropout-features 0.1 --feature-grad-mult 0.1 --loss-weights '[0.1, 10]' --conv-pos 128 --conv-pos-groups 16 \ --num-negatives 100 --cross-sample-negatives 0 --max-sample-size 1500000 --no-epoch-checkpoints \ --min-sample-size 2000 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 --max-tokens 1400000 \ --max-update 400000 --skip-invalid-size-inputs-valid-test --ddp-backend no_c10d

import torch, argparse, logging, os, sys from fairseq import options # I did this by manual splitting as list, base your command in build pre-trainned model cek = [ '/content/repo/fairseq/train.py', '/content/drive/My Drive/wav_manifest/', '--save-dir', '/content/drive/My Drive/wav2vec_v2_pre_train_model', '--fp16', '--num-workers', '128', '--task', 'audio_pretraining', '--criterion', 'wav2vec', '--arch', 'wav2vec2', '--log-keys', '["prob_perplexity","code_perplexity","temp"]', '--quantize-targets', '--extractor-mode', 'default', '--conv-feature-layers', '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2', '--final-dim', '256', '--latent-vars', '320', '--latent-groups', '2', '--latent-temp', '(2,0.5,0.999995)', '--infonce', '--optimizer', 'adam', '--adam-betas', '(0.9,0.98)', '--adam-eps', '1e-06', '--lr-scheduler', 'polynomial_decay', '--total-num-update', '400000', '--lr', '0.0005', '--warmup-updates', '32000', '--mask-length', '10', '--mask-prob', '0.65', '--mask-selection', 'static', '--mask-other', '0', '--encoder-layerdrop', '0.05', '--dropout-input', '0.1', '--dropout-features', '0.1', '--feature-grad-mult', '0.1', '--loss-weights', '[0.1, 10]', '--conv-pos', '128', '--conv-pos-groups', '16', '--num-negatives', '100', '--cross-sample-negatives', '0', '--max-sample-size', '1500000', '--no-epoch-checkpoints', '--min-sample-size', '2000', '--dropout', '0.1', '--attention-dropout', '0.1', '--weight-decay', '0.01', '--max-tokens', '1400000', '--max-update', '400000', '--skip-invalid-size-inputs-valid-test', '--ddp-backend', 'no_c10d' ] sys.argv = cek logging.basicConfig( format="%(asctime)s | %(levelname)s | %(name)s | %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=os.environ.get("LOGLEVEL", "INFO").upper(), stream=sys.stdout, ) logger = logging.getLogger("fairseq_cli.train") parser = options.get_training_parser() args = options.parse_args_and_arch(parser, modify_parser=None) model_path = 'checkpoint_best.pt mymodel = torch.load(model_path, map_location=torch.device('cpu')) mymodel['args'] = args torch.save(mymodel, 'new_fixed_model.pt')

facebookresearch / fairseq

torch cannot read / load the model from wav2vec 2.0 pre-trainned model. #2828

❓ Questions and Help