Closed phantomcoder1996 closed 3 years ago
If i am right then currently it uses wav2vec_small.pt model to define encoder architecture it adds the decoder layer, then it copies the parameters from the checkpoint_best.pt model into the arch. You need both of them for inference.
@spygaurad I thought so at first but what makes me think something is wrong is that the finetuned english models that are on the repo donot produce that error and I think if the checkpoint has the parameters then its state dict shall have the architecture as well
@phantomcoder1996 yeah i don't know why it is implemented that way. I finetuned on different language and i too require both .pt files ( the small.pt as well as finetuned.pt) at inference time.
The root cause is: The dict structure from finetuned model is different to the official model like wav2vec_small_960h
, at the beginning of inference, the fairseq framework automatically loads those training parameters according to your finetuned model's dict structure.
Here is the inspection and the temp solution below:
import torch
finetune_path = '/your/finetune/path.pt'
official_path = '/the/official/path/wav2vec_small_960h.pt'
fixed_model_path = '/path/to/fixed_model.pt'
finetune = torch.load(finetune_path, map_location=torch.device('cpu'))
official = torch.load(official_path, map_location=torch.device('cpu'))
print('finetune keys:', finetune.keys(), 'official keys', official.keys())
# the args in finetune is "None"
print('finetune args:', finetune['args'], 'official args: ', official['args'])
# To temporary solve it, we can imitate the args from official model
finetune.pop('cfg')
finetune['args'] = official['args']
torch.save(finetune, fixed_model_path)
@mychiux413 thank you for the quick solution.
This temp solution might make the fixed model can NOT resume the training, because we drop those training configures to make it independent, so don't forget to backup one. Fix the inference code should be a better idea.
I open the similar issue in #2828, I alredy investigate..., it because the tourch cannot read the model .pt
file.
until now is no answer why it's happen..., but i think, base on name of the function that run load_checkpoint_to_cpu
, maybe... just maybe ... error because we train it in gpu meanwhile fairseq try to open with cpu...
the stranger things is.... the model that i use is not finish from creating the best pre-trainned mode. and if i continue to train to make pre-trainned model, it's not a matter..., i train it in google colab, so every 12 hrs i must set new virtual machine to continues train the dataset.
but, right now, I'm prepare to set the next step. so I try test the next step, the finetuning step, no matter i use wav2vec 2.0 base model or vq-wav2vec model, it's cannot load the model.
I already follow the instruction that given by @mychiux413, the problem is my pre-trainned model is create from my own dataset and the result of args in my model is None
. and I cannot start at all any fine tuning process.
import torch
model_path = 'checkpoint_best.pt'
mymodel = torch.load(model_path, map_location=torch.device('cpu'))
then..
print('mymodel args: ', mymodel['args'])
mymodel args: None
but in cfg
print('mymodel cfg: ', mymodel['cfg'])
mymodel cfg: validate_interval_updates': 0, 'validate_after_updates': 0}, 'optimization': {'max_epoch': 0, 'max_update': 400000, 'clip_norm': 25.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.0005], 'min_lr': -1.0, 'use_bmuf': False, 'stop_time_hours': 0}, 'checkpoint': {'save_dir': '/content/drive/My Drive/wav2vec_v2_pre_train_model', 'restore_file': 'checkpoint_last.pt', 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': True, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'finetune_from_model': None, 'checkpoint_shard_count': 1, 'model_parallel_size': 1, 'distributed_rank': 0}, 'bmuf': {'block_lr': 1, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'task': Namespace(_name='audio_pretraining', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9,0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='wav2vec2', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=25.0, codebook_negatives=0, conv_bias=False, conv_feature_layers='[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2', conv_pos=128, conv_pos_groups=16, cpu=False, criterion='wav2vec', cross_sample_negatives=0, curriculum=0, data='/content/drive/My Drive/wav_manifest/', data_buffer_size=10, dataset_impl=None, ddp_backend='no_c10d', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.1, dropout_features=0.1, dropout_input=0.1, empty_cache_freq=0, enable_padding=False, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0.05, encoder_layers=12, end_learning_rate=0.0, eos=2, eval_wer=False, eval_wer_remove_bpe='letter', extractor_mode='default', fast_stat_sync=False, feature_grad_mult=0.1, final_dim=256, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', infonce=True, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, labels=None, latent_dim=0, latent_groups=2, latent_temp='(2,0.5,0.999995)', latent_vars=320, layer_norm_first=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, log_keys='["prob_perplexity","code_perplexity","temp"]', logit_temp=0.1, loss_weights='[0.1, 10]', lr=[0.0005], lr_scheduler='polynomial_decay', mask_channel_length=10, mask_channel_min_space=1, mask_channel_other=0, mask_channel_prob=0, mask_channel_selection='static', mask_length=10, mask_min_space=1, mask_other=0.0, mask_prob=0.65, mask_selection='static', max_epoch=0, max_sample_size=1500000, max_tokens=1400000, max_tokens_valid=1400000, max_update=400000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_sample_size=2000, model_parallel_size=1, negatives_from_everywhere=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_mask_channel_overlap=False, no_mask_overlap=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, normalize=False, nprocs_per_node=1, num_negatives=100, num_shards=1, num_workers=128, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, power=1.0, profile=False, quantization_config_path=None, quantize_input=False, quantize_targets=True, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', same_quantizer=False, sample_rate=16000, save_dir='/content/drive/My Drive/wav2vec_v2_pre_train_model', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, target_glu=False, task='audio_pretraining', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, total_num_update=400000, tpu=False, train_subset='train', unk=3, update_freq=[1], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=32000, weight_decay=0.01, zero_sharding='none'), 'optimizer': {'adam_betas': '(0.9,0.98)', 'adam_eps': 1e-06, 'weight_decay': 0.01, 'use_old_adam': False, '_name': 'adam', 'tpu': False, 'lr': [0.0005]}, 'common_eval': {'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'generation': {'beam': 5, 'nbest': 1, 'max_len_a': 0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1, 'unkpen': 0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': False, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'buffer_size': 0, 'input': '-'}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'tokenizer': None, 'bpe': None, 'criterion': Namespace(_name='wav2vec', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9,0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='wav2vec2', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=25.0, codebook_negatives=0, conv_bias=False, conv_feature_layers='[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2', conv_pos=128, conv_pos_groups=16, cpu=False, criterion='wav2vec', cross_sample_negatives=0, curriculum=0, data='/content/drive/My Drive/wav_manifest/', data_buffer_size=10, dataset_impl=None, ddp_backend='no_c10d', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.1, dropout_features=0.1, dropout_input=0.1, empty_cache_freq=0, enable_padding=False, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0.05, encoder_layers=12, end_learning_rate=0.0, eos=2, eval_wer=False, eval_wer_remove_bpe='letter', extractor_mode='default', fast_stat_sync=False, feature_grad_mult=0.1, final_dim=256, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', infonce=True, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, labels=None, latent_dim=0, latent_groups=2, latent_temp='(2,0.5,0.999995)', latent_vars=320, layer_norm_first=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, log_keys='["prob_perplexity","code_perplexity","temp"]', logit_temp=0.1, loss_weights='[0.1, 10]', lr=[0.0005], lr_scheduler='polynomial_decay', mask_channel_length=10, mask_channel_min_space=1, mask_channel_other=0, mask_channel_prob=0, mask_channel_selection='static', mask_length=10, mask_min_space=1, mask_other=0.0, mask_prob=0.65, mask_selection='static', max_epoch=0, max_sample_size=1500000, max_tokens=1400000, max_tokens_valid=1400000, max_update=400000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_sample_size=2000, model_parallel_size=1, negatives_from_everywhere=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_mask_channel_overlap=False, no_mask_overlap=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, normalize=False, nprocs_per_node=1, num_negatives=100, num_shards=1, num_workers=128, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, power=1.0, profile=False, quantization_config_path=None, quantize_input=False, quantize_targets=True, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', same_quantizer=False, sample_rate=16000, save_dir='/content/drive/My Drive/wav2vec_v2_pre_train_model', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, target_glu=False, task='audio_pretraining', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, total_num_update=400000, tpu=False, train_subset='train', unk=3, update_freq=[1], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=32000, weight_decay=0.01, zero_sharding='none'), 'lr_scheduler': Namespace(_name='polynomial_decay', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9,0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='wav2vec2', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=25.0, codebook_negatives=0, conv_bias=False, conv_feature_layers='[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2', conv_pos=128, conv_pos_groups=16, cpu=False, criterion='wav2vec', cross_sample_negatives=0, curriculum=0, data='/content/drive/My Drive/wav_manifest/', data_buffer_size=10, dataset_impl=None, ddp_backend='no_c10d', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.1, dropout_features=0.1, dropout_input=0.1, empty_cache_freq=0, enable_padding=False, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0.05, encoder_layers=12, end_learning_rate=0.0, eos=2, eval_wer=False, eval_wer_remove_bpe='letter', extractor_mode='default', fast_stat_sync=False, feature_grad_mult=0.1, final_dim=256, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', infonce=True, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, labels=None, latent_dim=0, latent_groups=2, latent_temp='(2,0.5,0.999995)', latent_vars=320, layer_norm_first=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, log_keys='["prob_perplexity","code_perplexity","temp"]', logit_temp=0.1, loss_weights='[0.1, 10]', lr=[0.0005], lr_scheduler='polynomial_decay', mask_channel_length=10, mask_channel_min_space=1, mask_channel_other=0, mask_channel_prob=0, mask_channel_selection='static', mask_length=10, mask_min_space=1, mask_other=0.0, mask_prob=0.65, mask_selection='static', max_epoch=0, max_sample_size=1500000, max_tokens=1400000, max_tokens_valid=1400000, max_update=400000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_sample_size=2000, model_parallel_size=1, negatives_from_everywhere=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_mask_channel_overlap=False, no_mask_overlap=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, normalize=False, nprocs_per_node=1, num_negatives=100, num_shards=1, num_workers=128, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, power=1.0, profile=False, quantization_config_path=None, quantize_input=False, quantize_targets=True, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', same_quantizer=False, sample_rate=16000, save_dir='/content/drive/My Drive/wav2vec_v2_pre_train_model', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, target_glu=False, task='audio_pretraining', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, total_num_update=400000, tpu=False, train_subset='train', unk=3, update_freq=[1], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=32000, weight_decay=0.01, zero_sharding='none'), 'model': Namespace(_name='wav2vec2', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9,0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='wav2vec2', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=25.0, codebook_negatives=0, conv_bias=False, conv_feature_layers='[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2', conv_pos=128, conv_pos_groups=16, cpu=False, criterion='wav2vec', cross_sample_negatives=0, curriculum=0, data='/content/drive/My Drive/wav_manifest/', data_buffer_size=10, dataset_impl=None, ddp_backend='no_c10d', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.1, dropout_features=0.1, dropout_input=0.1, empty_cache_freq=0, enable_padding=False, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0.05, encoder_layers=12, end_learning_rate=0.0, eos=2, eval_wer=False, eval_wer_remove_bpe='letter', extractor_mode='default', fast_stat_sync=False, feature_grad_mult=0.1, final_dim=256, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', infonce=True, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, labels=None, latent_dim=0, latent_groups=2, latent_temp='(2,0.5,0.999995)', latent_vars=320, layer_norm_first=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, log_keys='["prob_perplexity","code_perplexity","temp"]', logit_temp=0.1, loss_weights='[0.1, 10]', lr=[0.0005], lr_scheduler='polynomial_decay', mask_channel_length=10, mask_channel_min_space=1, mask_channel_other=0, mask_channel_prob=0, mask_channel_selection='static', mask_length=10, mask_min_space=1, mask_other=0.0, mask_prob=0.65, mask_selection='static', max_epoch=0, max_sample_size=1500000, max_tokens=1400000, max_tokens_valid=1400000, max_update=400000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_sample_size=2000, model_parallel_size=1, negatives_from_everywhere=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_mask_channel_overlap=False, no_mask_overlap=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, normalize=False, nprocs_per_node=1, num_negatives=100, num_shards=1, num_workers=128, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, power=1.0, profile=False, quantization_config_path=None, quantize_input=False, quantize_targets=True, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', same_quantizer=False, sample_rate=16000, save_dir='/content/drive/My Drive/wav2vec_v2_pre_train_model', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, target_glu=False, task='audio_pretraining', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, total_num_update=400000, tpu=False, train_subset='train', unk=3, update_freq=[1], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=32000, weight_decay=0.01, zero_sharding='none')}
did i must use some option in create own pre-trainned model? to make it can be finetuning?
what i must do?
please ... need an advice...
EDIT: I already got the answer..what should we do if get the problem like me...
refer to @mychiux413, after download 3.15 GB pre-trainned model to know what the content of the model 'args' and follow the process (pipeline) or debug the train porcess...
I don't know why when we create own pre-trainned model from own dataset..., the model that create with the similar command like in README.md, the model that created with that command does not have 'args' inside that model, so it will giving a trouble in continue to the next step, that is finetuning process.
so to solve it:
we must record the command that we use to ceate our own pre-trainned model, for example, mine use this command:
!python3 /content/repo/fairseq/train.py '/content/drive/My Drive/wav_manifest/' \
--save-dir '/content/drive/My Drive/wav2vec_v2_pre_train_model' --fp16 --num-workers 128 \
--task audio_pretraining --criterion wav2vec --arch wav2vec2 \
--log-keys '["prob_perplexity","code_perplexity","temp"]' --quantize-targets --extractor-mode default \
--conv-feature-layers '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2' --final-dim 256 --latent-vars 320 \
--latent-groups 2 --latent-temp '(2,0.5,0.999995)' --infonce --optimizer adam --adam-betas '(0.9,0.98)' \
--adam-eps 1e-06 --lr-scheduler polynomial_decay --total-num-update 400000 --lr 0.0005 --warmup-updates 32000 \
--mask-length 10 --mask-prob 0.65 --mask-selection static --mask-other 0 --encoder-layerdrop 0.05 \
--dropout-input 0.1 --dropout-features 0.1 --feature-grad-mult 0.1 --loss-weights '[0.1, 10]' --conv-pos 128 \
--conv-pos-groups 16 --num-negatives 100 --cross-sample-negatives 0 --max-sample-size 1500000 \
--no-epoch-checkpoints --min-sample-size 2000 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--max-tokens 1500000 --max-update 400000 --skip-invalid-size-inputs-valid-test --ddp-backend no_c10d
we use python to reproduce a litlle process when trainning that pre-trainned model to get the args from that command. the script to do that:
import torch, argparse, logging, math, os, random, sys, numpy as np
from fairseq import options
#create argument manually base on our command on create pre-trainned model.
sys.argv = ['/content/repo/fairseq/train.py' ,
'/content/drive/My Drive/wav_manifest/',
'--save-dir',
'/home/bram/Documents/coding/speech/traindata/cvdata/ori2/model/wav2vec2l',
'--fp16',
'--num-workers',
'128',
'--task',
'audio_pretraining',
'--criterion',
'wav2vec',
'--arch',
'wav2vec2',
'--log-keys',
'["prob_perplexity","code_perplexity","temp"]',
'--quantize-targets',
'--extractor-mode',
'default',
'--conv-feature-layers',
'[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2',
'--final-dim',
'256',
'--latent-vars',
'320',
'--latent-groups',
'2',
'--latent-temp',
'(2,0.5,0.999995)',
'--infonce',
'--optimizer',
'adam',
'--adam-betas',
'(0.9,0.98)',
'--adam-eps',
'1e-06',
'--lr-scheduler',
'polynomial_decay',
'--total-num-update',
'400000',
'--lr',
'0.0005',
'--warmup-updates',
'32000',
'--mask-length',
'10',
'--mask-prob',
'0.65',
'--mask-selection',
'static',
'--mask-other',
'0',
'--encoder-layerdrop',
'0.05',
'--dropout-input',
'0.1',
'--dropout-features',
'0.1',
'--feature-grad-mult',
'0.1',
'--loss-weights',
'[0.1, 10]',
'--conv-pos',
'128',
'--conv-pos-groups',
'16',
'--num-negatives',
'100',
'--cross-sample-negatives',
'0',
'--max-sample-size',
'1500000',
'--no-epoch-checkpoints',
'--min-sample-size',
'2000',
'--dropout',
'0.1',
'--attention-dropout',
'0.1',
'--weight-decay',
'0.01',
'--max-tokens',
'1500000',
'--max-update',
'400000',
'--skip-invalid-size-inputs-valid-test',
'--ddp-backend',
'no_c10d'
]
#do several process, get this pipeline order from fairseq train.py
logging.basicConfig(
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
level=os.environ.get("LOGLEVEL", "INFO").upper(),
stream=sys.stdout,
)
logger = logging.getLogger("fairseq_cli.train")
parser = options.get_training_parser()
#the args should exists in model:
args = options.parse_args_and_arch(parser, modify_parser=None)
# --------
#Note: I understand this args similar after download 3.15 GB wav2vec 2.0 shared pre-trainned model
# ---------
# in this step, we already have args that should be exists in own pre-trainned model (inside args variabel).
# now we should save it to the model that already create it. I will save it to checkpoint_best.pt, that i will use to fine tuning.
# I use step that show by @mychiux413 to put the args to model.
model_path = 'checkpoint_best.pt'
fixed_model = 'fixed_model.pt'
mymodel = torch.load(model_path, map_location=torch.device('cpu'))
mymodel['args'] = args
torch.save(mymodel, fixed_model)
after run the script above, we have the model that have args
that preventing from error because the error before, resulting by the result of torch reading process that try args
from the model but get result None
. so with the existing args
, it will prevent error.
for the next step, the finetuning process, be carefull with the apex installation method, make sure you not install apex with pip3 install .
, use this command to install apex:
pip3 install -e "/content/apex" --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
this should have been fixed some time ago - the args saved in finetuned checkpoint get updated to not point at the pretrained wav2vec checkpoint. might not be the case for checkpoints finetuned in the past though
I finetune the wav2vec small model (wav2vec_small.pt) with some data and then when I try to decode the finetuned checkpoint I get this error
the path it is complaining about is the path that used to have wav2vec_small.pt during the finetuning. It is mainly the path I used for --w2v-path during finetuning as if it is trying to read the weights from the wav2vec_small.pt without considering the trained weights
What have you tried?
1- Finetune wav2vec_small.pt using my data 2- run the decoding command with checkpoint_best.pt
What's your environment?