facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.66k stars 6.43k forks source link

torch cannot read / load the model from wav2vec 2.0 pre-trainned model. #2828

Closed wahyubram82 closed 4 years ago

wahyubram82 commented 4 years ago

❓ Questions and Help

I'm have problem in fine tuning the command that i use base on fine-tuning command README.md

python3 train.py '/home/bram/Documents/coding/speech/traindata/text_label' \
--save-dir '/home/bram/Documents/coding/speech/traindata/model_finetuning_wav2vec' --fp16 
--wer-args '("/home/bram/Documents/coding/speech/traindata/text_label/lm.bin","/home/bram/Documents/coding/speech/traindata/text_label/lexicon.txt",2,-1)' \
--post-process letter --valid-subset valid --no-epoch-checkpoints --best-checkpoint-metric wer --num-workers 128 \
--max-update 400000 --sentence-avg --task audio_pretraining --arch wav2vec_ctc \
--w2v-path '/home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/checkpoint_best.pt' \
--labels ltr --apply-mask --mask-selection static --mask-other 0 --mask-length 10 --mask-prob 0.5 --layerdrop 0.1 \
--mask-channel-selection static --mask-channel-other 0 --mask-channel-length 64 --mask-channel-prob 0.5 --zero-infinity \
--feature-grad-mult 0.0 --freeze-finetune-updates 10000 --validate-after-updates 10000 --optimizer adam \
--adam-betas '(0.9, 0.98)' --adam-eps 1e-08 --lr 2e-05 --lr-scheduler tri_stage --warmup-steps 8000 --hold-steps 32000 \
--decay-steps 40000 --final-lr-scale 0.05 --final-dropout 0.0 --dropout 0.0 --activation-dropout 0.1 --criterion ctc \
--attention-dropout 0.0 --max-tokens 1280000 --seed 2337 --log-format json --log-interval 500 --ddp-backend no_c10d

the error report:

File "/home/bram/Documents/coding/Speech/fairseq/train.py", line 14, in <module>
    cli_main()
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq_cli/train.py", line 345, in cli_main
    distributed_utils.call_main(args, main)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/distributed_utils.py", line 268, in call_main
    main(args, **kwargs)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq_cli/train.py", line 61, in main
    model = task.build_model(args)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/tasks/fairseq_task.py", line 546, in build_model
    model = models.build_model(args, self)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/models/__init__.py", line 57, in build_model
    return ARCH_MODEL_REGISTRY[model_cfg.arch].build_model(model_cfg, task)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 168, in build_model
    w2v_encoder = Wav2VecEncoder(args, task.target_dictionary)
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 331, in __init__
    args.w2v_path, arg_overrides
  File "/home/bram/Documents/coding/Speech/fairseq/fairseq/checkpoint_utils.py", line 211, in load_checkpoint_to_cpu
    setattr(args, arg_name, arg_val)
AttributeError: 'NoneType' object has no attribute 'dropout'

the folder /home/bram/Documents/coding/speech/traindata/text_label contains:

1. dict.ltr.txt
2. lexicon.txt
3. lm.bin
4. train.tsv
5. train.wrd
6. train.ltr
7. valid.tsv
8. valid.wrd
9. valid.ltr

the folder /home/bram/Documents/coding/speech/traindata/model_finetuning_wav2vec is empty means to save the finetuning model that resulting in finetuning process

the folder /home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/ is contained:

1. checkpoint_best.pt
2. checkpoint_last.pt

this file resulting from pre-trainned process from own dataset..command of pre-trained:

python3 /content/repo/fairseq/train.py '/content/drive/My Drive/wav_manifest/' \
--save-dir '/content/drive/My Drive/wav2vec_v2_pre_train_model'  --fp16 \
--num-workers 128 --task audio_pretraining --criterion wav2vec --arch wav2vec2 \
--log-keys '["prob_perplexity","code_perplexity","temp"]' --quantize-targets \
--extractor-mode default --conv-feature-layers '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2' --final-dim 256 \
--latent-vars 320 --latent-groups 2 --latent-temp '(2,0.5,0.999995)' --infonce \
--optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-06 --lr-scheduler polynomial_decay \
--total-num-update 400000 --lr 0.0005 --warmup-updates 32000 --mask-length 10 --mask-prob 0.65 \
--mask-selection static --mask-other 0 --encoder-layerdrop 0.05 --dropout-input 0.1 --dropout-features 0.1 \
--feature-grad-mult 0.1 --loss-weights '[0.1, 10]' --conv-pos 128 --conv-pos-groups 16 --num-negatives 100 \
--cross-sample-negatives 0 --max-sample-size 1500000 --no-epoch-checkpoints --min-sample-size 2000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 --max-tokens 1400000 --max-update 400000 \
--skip-invalid-size-inputs-valid-test --ddp-backend no_c10d

train in google colab...

I already try to debugging, by folllow the process, step by step, and found where the error happen, but cannot solve the problem.

the error happen in File fairseq/fairseq/checkpoint_utils.py, line 211, in load_checkpoint_to_cpu function.

I try to reproduce the step. here the report:

line 201 def load_checkpoint_to_cpu(path, arg_overrides=None): this function call by the /fairseq/fairseq/models/wav2vec/wav2vec2_asr.py line 330. before it, there is command to reproduce the arg_overrides variable. which the arg_overrides variable now is:

arg_overrides = {'dropout': 0.0,
                 'activation_dropout': 0.1,
                 'dropout_input': 0,
                 'attention_dropout': 0.0,
                 'mask_length': 10,
                 'mask_prob': 0.5,
                 'mask_selection': 'static',
                 'mask_other': 0.0,
                 'no_mask_overlap': False,
                 'mask_channel_length': 64,
                 'mask_channel_prob': 0.5,
                 'mask_channel_selection': 'static',
                 'mask_channel_other': 0.0,
                 'no_mask_channel_overlap': False,
                 'encoder_layerdrop': 0.1,
                 'feature_grad_mult': 0.0}

and the path is args.w2v_path that is '/home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/checkpoint_best.pt' in come because we set option --w2v-path.

so the function def load_checkpoint_to_cpu(path, arg_overrides=None): define variable path = '/home/bram/Documents/coding/speech/traindata/w2v2_pre_traned_model/checkpoint_best.pt'

ok, then... with open(PathManager.get_local_path(path), "rb") as f: in line 203 of file fairseq/fairseq/checkpoint_utils.py means call to read the file checkpoint_best.pt and define it as f variable.

error happens when executing: state = torch.load(f, map_location=lambda s, l: default_restore_location(s, "cpu")) in line 204 file fairseq/fairseq/checkpoint_utils.py.

the error report said 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte meaning that torch cannot load /read the checkpoint_best.pt.

any suggestion, can somebody help? if I can fix this I will continues the tutorial.

wahyubram82 commented 4 years ago

I Already Have the answer...

the problem is the fairseq, I don't know why, not save the arguments in the model. the argument that we use to build pre-trainned.

in my case, i use this command:

python3 /content/repo/fairseq/train.py '/content/drive/My Drive/wav_manifest/' \
--save-dir '/content/drive/My Drive/wav2vec_v2_pre_train_model' \
--fp16 --num-workers 128 --task audio_pretraining --criterion wav2vec --arch wav2vec2 \
--log-keys '["prob_perplexity","code_perplexity","temp"]' --quantize-targets \
--extractor-mode default --conv-feature-layers '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2' \
--final-dim 256 --latent-vars 320 --latent-groups 2 --latent-temp '(2,0.5,0.999995)' --infonce \
--optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-06 --lr-scheduler polynomial_decay \
--total-num-update 400000 --lr 0.0005 --warmup-updates 32000 --mask-length 10 \
--mask-prob 0.65 --mask-selection static --mask-other 0 --encoder-layerdrop 0.05 --dropout-input 0.1 \
--dropout-features 0.1 --feature-grad-mult 0.1 --loss-weights '[0.1, 10]' --conv-pos 128 --conv-pos-groups 16 \
--num-negatives 100 --cross-sample-negatives 0 --max-sample-size 1500000 --no-epoch-checkpoints \
--min-sample-size 2000 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 --max-tokens 1400000 \
--max-update 400000 --skip-invalid-size-inputs-valid-test --ddp-backend no_c10d

it should be have args: Namespace(activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9,0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='wav2vec2', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=25.0, codebook_negatives=0, conv_bias=False, conv_feature_layers='[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2', conv_pos=128, conv_pos_groups=16, cpu=False, criterion='wav2vec', cross_sample_negatives=0, curriculum=0, data='/content/drive/My Drive/wav_manifest/', data_buffer_size=10, dataset_impl=None, ddp_backend='no_c10d', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.1, dropout_features=0.1, dropout_input=0.1, empty_cache_freq=0, enable_padding=False, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0.05, encoder_layers=12, end_learning_rate=0.0, extractor_mode='default', fast_stat_sync=False, feature_grad_mult=0.1, final_dim=256, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', infonce=True, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, labels=None, latent_dim=0, latent_groups=2, latent_temp='(2,0.5,0.999995)', latent_vars=320, layer_norm_first=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, log_keys='["prob_perplexity","code_perplexity","temp"]', logit_temp=0.1, loss_weights='[0.1, 10]', lr=[0.0005], lr_scheduler='polynomial_decay', mask_channel_length=10, mask_channel_min_space=1, mask_channel_other=0, mask_channel_prob=0, mask_channel_selection='static', mask_length=10, mask_min_space=1, mask_other=0.0, mask_prob=0.65, mask_selection='static', max_epoch=0, max_sample_size=1500000, max_tokens=1400000, max_tokens_valid=1400000, max_update=400000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_sample_size=2000, model_parallel_size=1, negatives_from_everywhere=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_mask_channel_overlap=False, no_mask_overlap=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, normalize=False, nprocs_per_node=1, num_negatives=100, num_shards=1, num_workers=128, optimizer='adam', optimizer_overrides='{}', patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, power=1.0, profile=False, quantization_config_path=None, quantize_input=False, quantize_targets=True, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', same_quantizer=False, sample_rate=16000, save_dir='/content/drive/My Drive/wav2vec_v2_pre_train_model', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, target_glu=False, task='audio_pretraining', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, total_num_update=400000, tpu=False, train_subset='train', update_freq=[1], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=32000, weight_decay=0.01, zero_sharding='none')

so to make it have the args. do this...

first.. you must save you command when build your own pre-trainned model.... like i did in like I mention above.

then..

import torch, argparse, logging, os, sys
from fairseq import options

# I did this by manual splitting as list, base your command in build pre-trainned model
cek = [
    '/content/repo/fairseq/train.py',
    '/content/drive/My Drive/wav_manifest/', 
    '--save-dir',
    '/content/drive/My Drive/wav2vec_v2_pre_train_model',
    '--fp16',
    '--num-workers',
    '128',
    '--task',
    'audio_pretraining',
    '--criterion',
    'wav2vec',
    '--arch',
    'wav2vec2',
    '--log-keys',
    '["prob_perplexity","code_perplexity","temp"]',
    '--quantize-targets',
    '--extractor-mode',
    'default',
    '--conv-feature-layers',
    '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2',
    '--final-dim',
    '256', '--latent-vars',
    '320',
    '--latent-groups',
    '2',
    '--latent-temp',
    '(2,0.5,0.999995)',
    '--infonce', '--optimizer',
    'adam',
    '--adam-betas',
    '(0.9,0.98)',
    '--adam-eps',
    '1e-06',
    '--lr-scheduler',
    'polynomial_decay',
    '--total-num-update',
    '400000',
    '--lr',
    '0.0005',
    '--warmup-updates',
    '32000',
    '--mask-length',
    '10',
    '--mask-prob',
    '0.65',
    '--mask-selection',
    'static',
    '--mask-other',
    '0',
    '--encoder-layerdrop',
    '0.05',
    '--dropout-input',
    '0.1',
    '--dropout-features',
    '0.1',
    '--feature-grad-mult',
    '0.1',
    '--loss-weights',
    '[0.1, 10]',
    '--conv-pos',
    '128',
    '--conv-pos-groups',
    '16',
    '--num-negatives',
    '100',
    '--cross-sample-negatives',
    '0',
    '--max-sample-size',
    '1500000',
    '--no-epoch-checkpoints',
    '--min-sample-size',
    '2000',
    '--dropout',
    '0.1',
    '--attention-dropout',
    '0.1',
    '--weight-decay',
    '0.01',
    '--max-tokens',
    '1400000',
    '--max-update',
    '400000',
    '--skip-invalid-size-inputs-valid-test',
    '--ddp-backend',
    'no_c10d'
]

sys.argv = cek

logging.basicConfig(
    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    level=os.environ.get("LOGLEVEL", "INFO").upper(),
    stream=sys.stdout,
)
logger = logging.getLogger("fairseq_cli.train")

parser = options.get_training_parser()
args = options.parse_args_and_arch(parser, modify_parser=None)

model_path = 'checkpoint_best.pt
mymodel = torch.load(model_path, map_location=torch.device('cpu'))
mymodel['args'] = args
torch.save(mymodel, 'new_fixed_model.pt')

after that you can load this new_fixed_model.pt to fine-tuning.

problem solve