Closed Sohyo closed 4 years ago
You can determine the arguments/architecture by loading the model checkpoint and checking the 'args' attribute:
>>> import torch
>>> model = torch.load('wmt19.en-de.joined-dict.ensemble/model1.pt')
>>> model['args']
Namespace(adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, arch='transformer_wmt_en_de_big', attention_dropout=0.1, bucket_cap_mb=25, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data=['/private/home/edunov/wmt19/data/old/ende', '/private/home/edunov/wmt19/data/old/ende', '/private/home/edunov/wmt19/data/finetune/nc'], ddp_backend='c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, distributed_backend='nccl', distributed_init_method='tcp://localhost:17406', distributed_port=-1, distributed_rank=0, distributed_world_size=2, dropout=0.2, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=8192, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, extra_data='', fix_batches_to_gpus=False, fp16=True, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, lazy_load=False, left_pad_source=True, left_pad_target=False, log_format='simple', log_interval=100, lr=[0.0007], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=3584, max_update=201800, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=1e-09, momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=True, no_save=False, no_token_positional_embeddings=False, num_workers=0, optimizer='adam', optimizer_overrides='{}', raw_text=False, relu_dropout=0.0, reset_lr_scheduler=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='/checkpoint/edunov/20190403/wmt19en2de.btsample5.ffn8192.transformer_wmt_en_de_big_bsz3584_lr0.0007_dr0.2_size_updates200000_seed20_lbsm0.1_size_sa1_upsample2//finetune1', save_interval=1, save_interval_updates=200, seed=2, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=False, source_lang='en', target_lang='de', task='translation', tensorboard_logdir='', threshold_loss_scale=None, train_subset='train', update_freq=[1], upsample_primary=1, user_dir=None, valid_subset='valid', validate_interval=1, warmup_init_lr=1e-07, warmup_updates=4000, weight_decay=0.0)
While it started with transformer_vaswani_wmt_en_de_big
there are some customizations to other parameter. The main change seems to be --encoder-ffn-embed-dim=8192
.
Hello
I'm trying to finetune the provided pretrained model transformer.wmt19.de-en from the paper(Facebook FAIR’s WMT19 News Translation Task Submission). However, I cannot find the correct architecture for this pre-trained model. According to the paper, seems like 'transformer_vaswani_wmt_en_de_big' is used but it doesn't fit with the pre-trained model. Furthermore, I tried out all sensible other architectures, such as transformer, wmt_en_de_big but it didn't work either.
I preprocessed the data usin this command ;
fairseq-preprocess --source-lang de --target-lang en \ --trainpref $TEXT/train \ --validpref $TEXT/valid \ --testpref $TEXT/test \ --destdir data-bin/wmt19.tokenized.de-en \ --workers 20 \ --joined-dictionary --srcdict ../models/wmt19.de-en.joined-dict.ensemble/dict.de.txt \
And afterwards finetune like this ;
fairseq-train \ data-bin/wmt19.tokenized.de-en \ --restore-file ../models/wmt19.de-en.joined-dict.ensemble/model1.pt \ --save-dir checkpoints/finetune_wmt_model1 \ --arch transformer_vaswani_wmt_en_de_big --share-decoder-input-output-embed \ --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \ --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \ --dropout 0.3 --weight-decay 0.0001 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --eval-bleu \ --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \ --eval-bleu-detok moses \ --eval-bleu-remove-bpe \ --eval-bleu-print-samples \ --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \ --max-sentences 100
Then it gives me this error ;
Maybe the architecture was changed after saving the pretrained model or am I just doing things plain wrong? I hope you can help me figure out how to load the pretrained models when finetuning because I kind of ran out of ideas whats going wrong.