SALT-NLP / Multi-View-Seq2Seq

Source codes for the paper "Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization"
MIT License
89 stars 15 forks source link

Unable to replicate results reported in the paper? #2

Closed negrinho closed 3 years ago

negrinho commented 3 years ago

I've tried running your code from this repo but couldn't replicate the results that you report in the paper. For example, I don't achieve the best model at around 7 epochs as you say. The best model that I got performed significantly worse than your reported results. I only get to around 0.26 ROUGE1. Do you have any ideas about why this might be? Which version of Pytorch have you used? I'm using Pytorch 1.4 and the preprocessed data that you included in the repo. See below for the log for the single view model.

epoch 016 | loss 6.261 | nll_loss 4.916 | ppl 30.193 | wps 234.3 | ups 0.06 | wpb 4165.4 | bsz 158.3 | num_updates 1488 | lr 2.195e-05 | gnorm 2.165 | clip 100 | oom 0 | train_wall 1064 | wall 25366
epoch 016 | valid on 'valid' subset | loss 7.379 | nll_loss 6.115 | ppl 69.293 | wps 1017.4 | wpb 132.8 | bsz 5 | num_updates 1488 | best_loss 7.379
here bpe NONE
here!
Test on val set: 
100% 817/817 [02:35<00:00,  5.27it/s]
Val {'rouge-1': {'f': 0.26769580254553177, 'p': 0.30399684645069164, 'r': 0.26228723498609796}, 'rouge-2': {'f': 0.07173007955995553, 'p': 0.08290470011345255, 'r': 0.07046128497657979}, 'rouge-l': {'f': 0.264904383518601, 'p': 0.3149244870641518, 'r': 0.24536414711376517}}
2020-10-30 05:17:30 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_stage/checkpoint_best.pt (epoch 16 @ 1488 updates, score 7.379) (writing took 236.98618674099998 seconds)
Test on testing set: 
100% 818/818 [02:42<00:00,  5.03it/s]
Test {'rouge-1': {'f': 0.2707510254925983, 'p': 0.30304375457878013, 'r': 0.27045976455175946}, 'rouge-2': {'f': 0.07069378120884638, 'p': 0.08043789892742863, 'r': 0.07085366466696506}, 'rouge-l': {'f': 0.26921047464426007, 'p': 0.3131869557940146, 'r': 0.25498452981146014}}
jiaaoc commented 3 years ago

Are you using BART-large? One possible reason might be that facebook has updated the bpe and there might exist some mismatch when initializing the embedding matrix and the id of our special separator token.

jiaaoc commented 3 years ago

Also, it is abnormal that you achieve the best performance after 16 epoches. Based on my previous observations, the best model for single-view/multi-view will be achieved after 6 or 7 epoches.

jiaaoc commented 3 years ago

I think the codes in this repo should be good as I received emails from other people saying that they could replicate similar results.

negrinho commented 3 years ago

I haven't changed anything. Just cloned the repo and used colab to run your experiments. See the link for the colab file if you want to take a look (https://colab.research.google.com/drive/1tzmWGhSlnXBuBkYE2Llvzl0cS7k1KW-m?usp=sharing). You just have to upload your compressed data into your Google Drive folder and should be able to run the colab file right away. I cleaned the colab notebook a bit now, but definitely got those results last time I ran the code using colab.

negrinho commented 3 years ago

The learning rate seems very low. Can you post the output logs that result from running this code on your setup?

2020-11-06 17:46:21 | INFO | fairseq_cli.train | Namespace(T=1, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.1, balance=False, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format=None, log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=False, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.large/model.pt', save_dir='checkpoints_stage', save_interval=1, save_interval_updates=0, seed=14632, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=5000, train_subset='train', truncate_source=True, update_freq=[32], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=200, weight_decay=0.01)
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | [source] dictionary: 50264 types
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | [target] dictionary: 50264 types
2020-11-06 17:46:21 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.source
2020-11-06 17:46:21 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.target
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | cnn_dm-bin valid source-target 818 examples
2020-11-06 17:46:31 | INFO | fairseq_cli.train | BARTModel(
  (encoder): TransformerEncoder(
    (embed_tokens): Embedding(50264, 1024, padding_idx=1)
    (embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
    (layers): ModuleList(
      (0): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (1): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (2): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (3): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (4): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (5): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (6): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (7): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (8): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (9): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (10): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (11): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TransformerDecoder(
    (embed_tokens): Embedding(50264, 1024, padding_idx=1)
    (embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
    (layers): ModuleList(
      (0): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (1): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (2): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (3): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (4): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (5): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (6): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (7): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (8): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (9): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (10): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (11): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (classification_heads): ModuleDict()
  (section_positions): LearnedPositionalEmbedding(1025, 1024, padding_idx=0)
  (section_layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (section): LSTM(1024, 1024)
  (w_proj_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (w_proj): Linear(in_features=1024, out_features=1024, bias=True)
  (w_context_vector): Linear(in_features=1024, out_features=1, bias=False)
  (softmax): Softmax(dim=1)
)
2020-11-06 17:46:31 | INFO | fairseq_cli.train | model bart_large, criterion LabelSmoothedCrossEntropyCriterion
2020-11-06 17:46:31 | INFO | fairseq_cli.train | num. model params: 416791552 (num. trained: 416791552)
2020-11-06 17:46:38 | INFO | fairseq_cli.train | training on 1 GPUs
2020-11-06 17:46:38 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None
2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt
2020-11-06 17:46:38 | INFO | fairseq.trainer | loading train data for epoch 0
2020-11-06 17:46:38 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.source
2020-11-06 17:46:38 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.target
2020-11-06 17:46:38 | INFO | fairseq.tasks.translation | cnn_dm-bin train source-target 14731 examples
2020-11-06 17:46:38 | WARNING | fairseq.data.data_utils | 5 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269]
group1: 
511
group2: 
12
2020-11-06 17:46:38 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
here schedule!
False
epoch 001:  40% 37/93 [06:54<10:44, 11.52s/it, loss=14.612, nll_loss=14.464, ppl=22602, wps=377.8, ups=0.09, wpb=4223.9, bsz=156.7, num_updates=37, lr=5.55e-06, gnorm=4.996, clip=100, oom=0, train_wall=410, wall=415] 
jiaaoc commented 3 years ago

Hi this is one example log when we are training multi_view BART_base:

2020-10-16 20:22:37 | INFO | fairseq_cli.train | Namespace(T=0.2, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_base', attention_dropout=0.1, balance=True, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin_2', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=12, decoder_embed_dim=768, decoder_embed_path=None, decoder_ffn_embed_dim=3072, decoder_input_dim=768, decoder_layerdrop=0, decoder_layers=6, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=768, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format='json', log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=500.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=True, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=5, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.base/model.pt', save_dir='checkpoints_multi_base_1', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=2000, train_subset='train', truncate_source=True, update_freq=[16], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=120, weight_decay=0.01) 2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | [source] dictionary: 51200 types 2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | [target] dictionary: 51200 types 2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin_2/valid.source-target.source 2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.source 2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin_2/valid.source-target.target 2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 valid source-target 818 examples !!! 818 818 2020-10-16 20:22:40 | INFO | fairseq_cli.train | BARTModel( (encoder): TransformerEncoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (2): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (3): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (4): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (5): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (decoder): TransformerDecoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (2): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (3): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (4): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (5): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (classification_heads): ModuleDict() (section_positions): LearnedPositionalEmbedding(1025, 1024, padding_idx=0) (section_layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (section): LSTM(768, 768) (w_proj_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (w_proj): Linear(in_features=768, out_features=768, bias=True) (w_context_vector): Linear(in_features=768, out_features=1, bias=False) (softmax): Softmax(dim=1) ) 2020-10-16 20:22:40 | INFO | fairseq_cli.train | model bart_base, criterion LabelSmoothedCrossEntropyCriterion 2020-10-16 20:22:40 | INFO | fairseq_cli.train | num. model params: 146507776 (num. trained: 146507776) 2020-10-16 20:22:43 | INFO | fairseq_cli.train | training on 1 GPUs 2020-10-16 20:22:43 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None 2020-10-16 20:22:43 | INFO | fairseq.trainer | loaded checkpoint ./bart.base/model.pt (epoch 14 @ 0 updates) group1: 259 group2: 12 2020-10-16 20:22:43 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16 here schedule! 2020-10-16 20:22:43 | INFO | fairseq.trainer | loading train data for epoch 0 2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.source 2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.source 2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.target 2020-10-16 20:22:43 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 train source-target 14731 examples !!! 14731 14731 2020-10-16 20:22:43 | WARNING | fairseq.data.data_utils | 6 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269, 8197] True 2020-10-16 20:28:05 | INFO | train | {"epoch": 1, "train_loss": "5.334", "train_nll_loss": "3.491", "train_ppl": "11.247", "train_wps": "1206.4", "train_ups": "0.59", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "189", "train_lr": "2.88989e-05", "train_gnorm": "6.384", "train_clip": "100", "train_oom": "0", "train_train_wall": "303", "train_wall": "323"} /pytorch/torch/csrc/utils/python_argparser.cpp:756: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, *, Number alpha) 2020-10-16 20:28:11 | INFO | valid | {"epoch": 1, "valid_loss": "4.494", "valid_nll_loss": "2.632", "valid_ppl": "6.201", "valid_wps": "3638.8", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "189"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.39893327494573744, 'p': 0.48739021531416354, 'r': 0.3672381425752768}, 'rouge-2': {'f': 0.19168286247403196, 'p': 0.23579704030498724, 'r': 0.1772675131514576}, 'rouge-l': {'f': 0.38773004473056544, 'p': 0.4650643030400437, 'r': 0.3571665562085555}} 2020-10-16 20:29:16 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 1 @ 189 updates, score 4.494) (writing took 2.720979069825262 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3881301009025109, 'p': 0.47039422544482545, 'r': 0.3606226446800223}, 'rouge-2': {'f': 0.17881792205695904, 'p': 0.21852663998652969, 'r': 0.16731151894505894}, 'rouge-l': {'f': 0.3800338863639725, 'p': 0.4518477819676159, 'r': 0.35300024402391994}} /pytorch/aten/src/ATen/native/BinaryOps.cpp:66: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. 2020-10-16 20:35:49 | INFO | train | {"epoch": 2, "train_loss": "4.432", "train_nll_loss": "2.625", "train_ppl": "6.168", "train_wps": "835.9", "train_ups": "0.41", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "378", "train_lr": "2.5883e-05", "train_gnorm": "2.332", "train_clip": "100", "train_oom": "0", "train_train_wall": "313", "train_wall": "786"} 2020-10-16 20:35:54 | INFO | valid | {"epoch": 2, "valid_loss": "4.322", "valid_nll_loss": "2.492", "valid_ppl": "5.627", "valid_wps": "3825.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "378", "valid_best_loss": "4.322"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4279867966292195, 'p': 0.4571306118661938, 'r': 0.44022592801096416}, 'rouge-2': {'f': 0.21371075541532447, 'p': 0.2285121015700478, 'r': 0.22154388398488878}, 'rouge-l': {'f': 0.4196784994354386, 'p': 0.4456033541138049, 'r': 0.4278439895292393}} 2020-10-16 20:37:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 2 @ 378 updates, score 4.322) (writing took 11.695231701247394 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4135366281231374, 'p': 0.444507361098039, 'r': 0.423101539955033}, 'rouge-2': {'f': 0.19432386047889444, 'p': 0.21068450242099074, 'r': 0.19930810921791703}, 'rouge-l': {'f': 0.4059372042257342, 'p': 0.43402206622848205, 'r': 0.41056085492657124}} 2020-10-16 20:44:23 | INFO | train | {"epoch": 3, "train_loss": "4.19", "train_nll_loss": "2.368", "train_ppl": "5.162", "train_wps": "753.5", "train_ups": "0.37", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "567", "train_lr": "2.2867e-05", "train_gnorm": "2.312", "train_clip": "100", "train_oom": "0", "train_train_wall": "323", "train_wall": "1300"} 2020-10-16 20:44:28 | INFO | valid | {"epoch": 3, "valid_loss": "4.242", "valid_nll_loss": "2.397", "valid_ppl": "5.266", "valid_wps": "3877.8", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "567", "valid_best_loss": "4.242"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4315874956151277, 'p': 0.48791636226347873, 'r': 0.41966804049154866}, 'rouge-2': {'f': 0.2188827333313949, 'p': 0.2480313270750059, 'r': 0.21386282199141377}, 'rouge-l': {'f': 0.41806919758048416, 'p': 0.4660477028142457, 'r': 0.40645590435600293}} 2020-10-16 20:45:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 3 @ 567 updates, score 4.242) (writing took 8.149096994195133 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4138701252407696, 'p': 0.4678226724582506, 'r': 0.4065260752587133}, 'rouge-2': {'f': 0.19377017502951563, 'p': 0.22115075342729995, 'r': 0.1904202394946015}, 'rouge-l': {'f': 0.4010310496097956, 'p': 0.44619094434249496, 'r': 0.3929682749938597}} 2020-10-16 20:52:16 | INFO | train | {"epoch": 4, "train_loss": "4.009", "train_nll_loss": "2.171", "train_ppl": "4.505", "train_wps": "818.8", "train_ups": "0.4", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "756", "train_lr": "1.98511e-05", "train_gnorm": "2.142", "train_clip": "100", "train_oom": "0", "train_train_wall": "298", "train_wall": "1773"} 2020-10-16 20:52:21 | INFO | valid | {"epoch": 4, "valid_loss": "4.192", "valid_nll_loss": "2.352", "valid_ppl": "5.104", "valid_wps": "3870.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "756", "valid_best_loss": "4.192"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.43669851905346935, 'p': 0.47079025587829726, 'r': 0.4442491635654981}, 'rouge-2': {'f': 0.21691059357860815, 'p': 0.23458404727614512, 'r': 0.220700188219085}, 'rouge-l': {'f': 0.424007418421588, 'p': 0.4522198002073135, 'r': 0.4289625411274865}} 2020-10-16 20:53:51 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 4 @ 756 updates, score 4.192) (writing took 9.192486869171262 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4225243192276242, 'p': 0.45223754352590134, 'r': 0.43401393897009366}, 'rouge-2': {'f': 0.19391878994057452, 'p': 0.20822517840877766, 'r': 0.1994167266777721}, 'rouge-l': {'f': 0.4080676812912881, 'p': 0.43246768831954485, 'r': 0.4165620181146185}} 2020-10-16 21:00:31 | INFO | train | {"epoch": 5, "train_loss": "3.887", "train_nll_loss": "2.039", "train_ppl": "4.11", "train_wps": "781.9", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "945", "train_lr": "1.68351e-05", "train_gnorm": "2.074", "train_clip": "100", "train_oom": "0", "train_train_wall": "303", "train_wall": "2269"} 2020-10-16 21:00:37 | INFO | valid | {"epoch": 5, "valid_loss": "4.186", "valid_nll_loss": "2.342", "valid_ppl": "5.071", "valid_wps": "3877.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "945", "valid_best_loss": "4.186"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4461844243274079, 'p': 0.45666747672161484, 'r': 0.4767132459495706}, 'rouge-2': {'f': 0.22200930553762793, 'p': 0.22706208545278755, 'r': 0.23842818519947412}, 'rouge-l': {'f': 0.43306061447923366, 'p': 0.44238726551550944, 'r': 0.4563430992482793}} 2020-10-16 21:02:12 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 5 @ 945 updates, score 4.186) (writing took 11.725180207751691 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4333094974371008, 'p': 0.4428674781714452, 'r': 0.46658344288666276}, 'rouge-2': {'f': 0.20111995145025907, 'p': 0.20625643433873106, 'r': 0.217710295965629}, 'rouge-l': {'f': 0.42099773909932536, 'p': 0.42971386878603307, 'r': 0.44643271326223904}} 2020-10-16 21:08:55 | INFO | train | {"epoch": 6, "train_loss": "3.787", "train_nll_loss": "1.93", "train_ppl": "3.81", "train_wps": "769.1", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1134", "train_lr": "1.38191e-05", "train_gnorm": "2.034", "train_clip": "100", "train_oom": "0", "train_train_wall": "302", "train_wall": "2772"} 2020-10-16 21:09:00 | INFO | valid | {"epoch": 6, "valid_loss": "4.18", "valid_nll_loss": "2.343", "valid_ppl": "5.075", "valid_wps": "3875.3", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1134", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.44834014142761747, 'p': 0.47490514105110054, 'r': 0.46418072489161993}, 'rouge-2': {'f': 0.22448318205207965, 'p': 0.23786780200771554, 'r': 0.23428752100684014}, 'rouge-l': {'f': 0.431713338823393, 'p': 0.4549432489377001, 'r': 0.44269410749902055}} 2020-10-16 21:10:27 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 6 @ 1134 updates, score 4.18) (writing took 7.535578944720328 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4353075633115118, 'p': 0.45960422226275544, 'r': 0.4524936710724383}, 'rouge-2': {'f': 0.20453167333689543, 'p': 0.21761783799140255, 'r': 0.21275645602953855}, 'rouge-l': {'f': 0.419203755880583, 'p': 0.43906384085100353, 'r': 0.43266115281125556}} 2020-10-16 21:17:11 | INFO | train | {"epoch": 7, "train_loss": "3.715", "train_nll_loss": "1.85", "train_ppl": "3.604", "train_wps": "781.3", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1323", "train_lr": "1.08032e-05", "train_gnorm": "2.042", "train_clip": "100", "train_oom": "0", "train_train_wall": "306", "train_wall": "3268"} 2020-10-16 21:17:16 | INFO | valid | {"epoch": 7, "valid_loss": "4.181", "valid_nll_loss": "2.345", "valid_ppl": "5.081", "valid_wps": "3853.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1323", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4481947505100136, 'p': 0.46780376177094585, 'r': 0.4699430608678166}, 'rouge-2': {'f': 0.22568330262401542, 'p': 0.23667032669984672, 'r': 0.2375391979501824}, 'rouge-l': {'f': 0.43290810524697976, 'p': 0.4484189183310228, 'r': 0.45029655273945113}} 2020-10-16 21:18:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 7 @ 1323 updates, score 4.181) (writing took 3.958364794962108 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42504939222262134, 'p': 0.4437278759292623, 'r': 0.44713921113126553}, 'rouge-2': {'f': 0.19796039505355403, 'p': 0.2078876050553927, 'r': 0.20811213382174953}, 'rouge-l': {'f': 0.41333885103722146, 'p': 0.42888192585788004, 'r': 0.4307527462400245}} 2020-10-16 21:25:20 | INFO | train | {"epoch": 8, "train_loss": "3.655", "train_nll_loss": "1.783", "train_ppl": "3.442", "train_wps": "791.5", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1512", "train_lr": "7.78723e-06", "train_gnorm": "2.018", "train_clip": "100", "train_oom": "0", "train_train_wall": "296", "train_wall": "3757"} 2020-10-16 21:25:25 | INFO | valid | {"epoch": 8, "valid_loss": "4.188", "valid_nll_loss": "2.358", "valid_ppl": "5.126", "valid_wps": "3883.7", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1512", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4497831590533799, 'p': 0.4743879319512072, 'r': 0.46696211017029543}, 'rouge-2': {'f': 0.22560126332234826, 'p': 0.23866609130364316, 'r': 0.23526072930189967}, 'rouge-l': {'f': 0.4331148209032409, 'p': 0.4523118415115414, 'r': 0.4466045581005789}} 2020-10-16 21:26:54 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 8 @ 1512 updates, score 4.188) (writing took 6.4392871032468975 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42990064221283536, 'p': 0.4527699793144171, 'r': 0.44901333807162896}, 'rouge-2': {'f': 0.20288890059810166, 'p': 0.21534024009359243, 'r': 0.21117838919893522}, 'rouge-l': {'f': 0.41598493880199483, 'p': 0.4349848431222833, 'r': 0.4314877150138075}} 2020-10-16 21:33:30 | INFO | train | {"epoch": 9, "train_loss": "3.608", "train_nll_loss": "1.731", "train_ppl": "3.32", "train_wps": "789.6", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1701", "train_lr": "4.77128e-06", "train_gnorm": "1.996", "train_clip": "100", "train_oom": "0", "train_train_wall": "298", "train_wall": "4248"} 2020-10-16 21:33:36 | INFO | valid | {"epoch": 9, "valid_loss": "4.19", "valid_nll_loss": "2.359", "valid_ppl": "5.129", "valid_wps": "3839.3", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1701", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4469569361973397, 'p': 0.4798210784938584, 'r': 0.45528237283388123}, 'rouge-2': {'f': 0.2263709493623402, 'p': 0.2433974733721229, 'r': 0.23132458588703095}, 'rouge-l': {'f': 0.4316170975348626, 'p': 0.45903993291697, 'r': 0.43719317336834507}} 2020-10-16 21:34:59 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 9 @ 1701 updates, score 4.19) (writing took 3.861534607131034 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42781119671981177, 'p': 0.46124989848989056, 'r': 0.436261581761792}, 'rouge-2': {'f': 0.20098620993180524, 'p': 0.21921218562638198, 'r': 0.20423255976966606}, 'rouge-l': {'f': 0.41332683231640294, 'p': 0.44168964220540274, 'r': 0.4191730895606922}} 2020-10-16 21:41:41 | INFO | train | {"epoch": 10, "train_loss": "3.575", "train_nll_loss": "1.693", "train_ppl": "3.233", "train_wps": "789.3", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1890", "train_lr": "1.75532e-06", "train_gnorm": "1.981", "train_clip": "100", "train_oom": "0", "train_train_wall": "307", "train_wall": "4739"} 2020-10-16 21:41:47 | INFO | valid | {"epoch": 10, "valid_loss": "4.203", "valid_nll_loss": "2.369", "valid_ppl": "5.167", "valid_wps": "3792", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1890", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4489107167227575, 'p': 0.472013059581364, 'r': 0.46714979898744746}, 'rouge-2': {'f': 0.22538691538484998, 'p': 0.23704805111264826, 'r': 0.23613808765906907}, 'rouge-l': {'f': 0.4316388762633301, 'p': 0.45038233458541693, 'r': 0.44596016494663443}} 2020-10-16 21:43:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 10 @ 1890 updates, score 4.203) (writing took 3.90104684792459 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4306987172598409, 'p': 0.4531877008856705, 'r': 0.4487099330991109}, 'rouge-2': {'f': 0.2012845838629506, 'p': 0.21321799010752718, 'r': 0.20959143334202265}, 'rouge-l': {'f': 0.41571496654100026, 'p': 0.43409465797013363, 'r': 0.4300684739298519}} 2020-10-16 21:50:12 | INFO | train | {"epoch": 11, "train_loss": "3.556", "train_nll_loss": "1.671", "train_ppl": "3.185", "train_wps": "758.1", "train_ups": "0.37", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "2079", "train_lr": "0", "train_gnorm": "1.957", "train_clip": "100", "train_oom": "0", "train_train_wall": "316", "train_wall": "5249"} 2020-10-16 21:50:18 | INFO | valid | {"epoch": 11, "valid_loss": "4.203", "valid_nll_loss": "2.371", "valid_ppl": "5.172", "valid_wps": "3874.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "2079", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.446642502255209, 'p': 0.4723302484715944, 'r': 0.46238440805767694}, 'rouge-2': {'f': 0.22346453229760987, 'p': 0.2373075042245457, 'r': 0.23220979841378647}, 'rouge-l': {'f': 0.4295766026698361, 'p': 0.4502630944373418, 'r': 0.4416669748596527}} 2020-10-16 21:51:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 11 @ 2079 updates, score 4.203) (writing took 4.998790017794818 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42976349488980414, 'p': 0.4532897733258767, 'r': 0.44788645672433053}, 'rouge-2': {'f': 0.199710011641502, 'p': 0.21187374900537204, 'r': 0.20828099794102461}, 'rouge-l': {'f': 0.4141090520180718, 'p': 0.43257857065512195, 'r': 0.42896840192756164}} 2020-10-16 21:58:20 | INFO | train | {"epoch": 12, "train_loss": "3.55", "train_nll_loss": "1.665", "train_ppl": "3.172", "train_wps": "793.6", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "2268", "train_lr": "0", "train_gnorm": "1.939", "train_clip": "100", "train_oom": "0", "train_train_wall": "300", "train_wall": "5738"} 2020-10-16 21:58:26 | INFO | valid | {"epoch": 12, "valid_loss": "4.203", "valid_nll_loss": "2.371", "valid_ppl": "5.172", "valid_wps": "3878.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "2268", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.446642502255209, 'p': 0.4723302484715944, 'r': 0.46238440805767694}, 'rouge-2': {'f': 0.22346453229760987, 'p': 0.2373075042245457, 'r': 0.23220979841378647}, 'rouge-l': {'f': 0.4295766026698361, 'p': 0.4502630944373418, 'r': 0.4416669748596527}} 2020-10-16 21:59:52 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 12 @ 2268 updates, score 4.203) (writing took 4.920202174689621 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42976349488980414, 'p': 0.4532897733258767, 'r': 0.44788645672433053}, 'rouge-2': {'f': 0.199710011641502, 'p': 0.21187374900537204, 'r': 0.20828099794102461}, 'rouge-l': {'f': 0.4141090520180718, 'p': 0.43257857065512195, 'r': 0.42896840192756164}} 2020-10-16 22:01:12 | INFO | fairseq_cli.train | early stop since valid performance hasn't improved for last 5 runs 2020-10-16 22:01:12 | INFO | fairseq_cli.train | done training in 5908.6 seconds

jiaaoc commented 3 years ago

Since I am not able to get access to the P100 machines, I am testing the train_single_view.sh with a max_length = 500. And I will post the training log here later.

negrinho commented 3 years ago

Thanks for looking. I think that you may be right about it being a discrepancy between the tokenizations somehow. I get much lower results. The preprocessed files may no longer be up to date for the versions that colab is pulling. If you get a change to run the colab file, that would be great. I will preprocess the data again and see if I get different results.

jiaaoc commented 3 years ago

Yes, I have tried training from scratch without BART_initialization as well, and the results were better than what you have observed.

jiaaoc commented 3 years ago

This is the log for BART_base encoder + random initialized decoder for single-view training:

2020-09-12 23:48:39 | INFO | fairseq_cli.train | Namespace(T=1, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_encoder_base', attention_dropout=0.2, balance=False, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=5.0, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='data_none', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=4, decoder_embed_dim=768, decoder_embed_path=None, decoder_ffn_embed_dim=3072, decoder_input_dim=768, decoder_layerdrop=0, decoder_layers=2, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=768, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.2, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format='json', log_interval=1000, lr=[3e-05], lr_scheduler='inverse_sqrt', lr_weight=100.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=False, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=30, pooler_activation_fn='tanh', pooler_dropout=0.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.base/model.pt', save_dir='checkpoints_scratch_1', save_interval=1, save_interval_updates=0, seed=0, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', temp_file='bart_base_scratch', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, train_subset='train', truncate_source=True, update_freq=[16], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, view_2_path='None', warmup_init_lr=-1, warmup_updates=400, weight_decay=0.1) 2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | [source] dictionary: 51200 types 2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | [target] dictionary: 51200 types 2020-09-12 23:48:39 | INFO | fairseq.data.data_utils | loaded 818 examples from: data_none/valid.source-target.source 2020-09-12 23:48:39 | INFO | fairseq.data.data_utils | loaded 818 examples from: data_none/valid.source-target.target 2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | data_none valid source-target 818 examples 2020-09-12 23:48:44 | INFO | fairseq_cli.train | BARTModel( (encoder): TransformerEncoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (2): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (3): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (4): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (5): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (decoder): TransformerDecoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (classification_heads): ModuleDict() (section): LSTM(768, 768) (w_proj_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (w_proj): Linear(in_features=768, out_features=768, bias=True) (w_context_vector): Linear(in_features=768, out_features=1, bias=False) (softmax): Softmax(dim=1) ) 2020-09-12 23:48:44 | INFO | fairseq_cli.train | model bart_encoder_base, criterion LabelSmoothedCrossEntropyCriterion 2020-09-12 23:48:44 | INFO | fairseq_cli.train | num. model params: 107649024 (num. trained: 107649024) 2020-09-12 23:48:48 | INFO | fairseq_cli.train | training on 1 GPUs 2020-09-12 23:48:48 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None bart_encoder_base 2020-09-12 23:48:48 | INFO | fairseq.trainer | loaded checkpoint ./bart.base/model.pt (epoch 14 @ 0 updates) group1: 103 group2: 61 2020-09-12 23:48:48 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16 here schedule! 2020-09-12 23:48:48 | INFO | fairseq.trainer | loading train data for epoch 0 2020-09-12 23:48:48 | INFO | fairseq.data.data_utils | loaded 14731 examples from: data_none/train.source-target.source 2020-09-12 23:48:48 | INFO | fairseq.data.data_utils | loaded 14731 examples from: data_none/train.source-target.target 2020-09-12 23:48:48 | INFO | fairseq.tasks.translation | data_none train source-target 14731 examples 2020-09-12 23:48:49 | WARNING | fairseq.data.data_utils | 5 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269] False 2020-09-12 23:51:50 | INFO | train | {"epoch": 1, "train_loss": "10.421", "train_nll_loss": "9.332", "train_ppl": "644.363", "train_wps": "2140", "train_ups": "1.01", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "182", "train_lr": "1.365e-05", "train_gnorm": "2.41", "train_clip": "8.2", "train_oom": "0", "train_train_wall": "170", "train_wall": "182"} /pytorch/torch/csrc/utils/python_argparser.cpp:756: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, *, Number alpha) 2020-09-12 23:51:54 | INFO | valid | {"epoch": 1, "valid_loss": "7.602", "valid_nll_loss": "6.152", "valid_ppl": "71.102", "valid_wps": "4947.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "182"} here bpe NONE here! Val {'rouge-1': {'f': 0.21663793375370063, 'p': 0.22864870158974218, 'r': 0.2287239463029277}, 'rouge-2': {'f': 0.054124139134273476, 'p': 0.05736806219841362, 'r': 0.057741245203472076}, 'rouge-l': {'f': 0.22703430161156996, 'p': 0.2589247848795017, 'r': 0.21878059623920781}} 2020-09-12 23:53:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 1 @ 182 updates, score 7.602) (writing took 3.440072625875473 seconds) Test on testing set: Test {'rouge-1': {'f': 0.2143088315976068, 'p': 0.22446929769837956, 'r': 0.2289120110339817}, 'rouge-2': {'f': 0.05192138274699011, 'p': 0.05429316369788991, 'r': 0.05615224649979625}, 'rouge-l': {'f': 0.2247126841007863, 'p': 0.25170078332283147, 'r': 0.22005435135775175}} /pytorch/aten/src/ATen/native/BinaryOps.cpp:66: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. 2020-09-12 23:57:52 | INFO | train | {"epoch": 2, "train_loss": "7.32", "train_nll_loss": "5.913", "train_ppl": "60.24", "train_wps": "1069.3", "train_ups": "0.5", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "364", "train_lr": "2.73e-05", "train_gnorm": "3.339", "train_clip": "16.5", "train_oom": "0", "train_train_wall": "170", "train_wall": "545"} 2020-09-12 23:57:57 | INFO | valid | {"epoch": 2, "valid_loss": "7.298", "valid_nll_loss": "5.785", "valid_ppl": "55.125", "valid_wps": "4876", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "364", "valid_best_loss": "7.298"} here bpe NONE here! Val {'rouge-1': {'f': 0.2902339258903693, 'p': 0.35493902025894375, 'r': 0.27679295120544645}, 'rouge-2': {'f': 0.09023858321606967, 'p': 0.1113931624274892, 'r': 0.08757866133645495}, 'rouge-l': {'f': 0.2997845837858199, 'p': 0.401101296990871, 'r': 0.26031260412345514}} 2020-09-12 23:59:21 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 2 @ 364 updates, score 7.298) (writing took 7.891215533949435 seconds) Test on testing set: Test {'rouge-1': {'f': 0.28494613701125965, 'p': 0.3516514035592438, 'r': 0.2699670810086879}, 'rouge-2': {'f': 0.08420177692788793, 'p': 0.10510303003646479, 'r': 0.08140088225186674}, 'rouge-l': {'f': 0.292386051869777, 'p': 0.39432852537102037, 'r': 0.2527454550206724}} 2020-09-13 00:03:40 | INFO | train | {"epoch": 3, "train_loss": "7.642", "train_nll_loss": "6.298", "train_ppl": "78.696", "train_wps": "1114.8", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "546", "train_lr": "2.56776e-05", "train_gnorm": "9.301", "train_clip": "99.5", "train_oom": "0", "train_train_wall": "174", "train_wall": "892"} 2020-09-13 00:03:45 | INFO | valid | {"epoch": 3, "valid_loss": "6.83", "valid_nll_loss": "5.266", "valid_ppl": "38.479", "valid_wps": "4512.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "546", "valid_best_loss": "6.83"} here bpe NONE here! Val {'rouge-1': {'f': 0.28004401636313203, 'p': 0.45120614382755203, 'r': 0.22356730714002326}, 'rouge-2': {'f': 0.1042067031811556, 'p': 0.17100306931561168, 'r': 0.0837493874679366}, 'rouge-l': {'f': 0.2834823452672735, 'p': 0.46510518092133646, 'r': 0.22059007016249965}} 2020-09-13 00:04:39 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 3 @ 546 updates, score 6.83) (writing took 8.794937412254512 seconds) Test on testing set: Test {'rouge-1': {'f': 0.26935009742145355, 'p': 0.43147623041129457, 'r': 0.21399299095643845}, 'rouge-2': {'f': 0.0928812558099079, 'p': 0.15352525158121336, 'r': 0.07361179442736973}, 'rouge-l': {'f': 0.2711337233458924, 'p': 0.44552105877903414, 'r': 0.20951603467624907}} 2020-09-13 00:08:36 | INFO | train | {"epoch": 4, "train_loss": "6.986", "train_nll_loss": "5.561", "train_ppl": "47.22", "train_wps": "1310", "train_ups": "0.62", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "728", "train_lr": "2.22375e-05", "train_gnorm": "7.492", "train_clip": "90.1", "train_oom": "0", "train_train_wall": "179", "train_wall": "1188"} 2020-09-13 00:08:41 | INFO | valid | {"epoch": 4, "valid_loss": "6.512", "valid_nll_loss": "4.873", "valid_ppl": "29.308", "valid_wps": "4090.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "728", "valid_best_loss": "6.512"} here bpe NONE here! Val {'rouge-1': {'f': 0.3058323314303917, 'p': 0.45101872632920387, 'r': 0.255721901517651}, 'rouge-2': {'f': 0.12803820225900034, 'p': 0.19195720632392738, 'r': 0.10762098110671683}, 'rouge-l': {'f': 0.3125195961416917, 'p': 0.4667913347552745, 'r': 0.25371581048821884}} 2020-09-13 00:09:39 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 4 @ 728 updates, score 6.512) (writing took 8.845161844976246 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3067963357485839, 'p': 0.4412779946655861, 'r': 0.2587007664111758}, 'rouge-2': {'f': 0.12092130534646459, 'p': 0.17638283156638504, 'r': 0.10283473953818749}, 'rouge-l': {'f': 0.31175281522793147, 'p': 0.4556750348102311, 'r': 0.2543185560104487}} 2020-09-13 00:13:33 | INFO | train | {"epoch": 5, "train_loss": "6.593", "train_nll_loss": "5.121", "train_ppl": "34.798", "train_wps": "1301.7", "train_ups": "0.61", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "910", "train_lr": "1.98898e-05", "train_gnorm": "6.119", "train_clip": "61", "train_oom": "0", "train_train_wall": "173", "train_wall": "1485"} 2020-09-13 00:13:38 | INFO | valid | {"epoch": 5, "valid_loss": "6.283", "valid_nll_loss": "4.649", "valid_ppl": "25.088", "valid_wps": "4915", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "910", "valid_best_loss": "6.283"} here bpe NONE here! Val {'rouge-1': {'f': 0.32631664800109805, 'p': 0.4096853241359607, 'r': 0.30773293666069307}, 'rouge-2': {'f': 0.139532263771638, 'p': 0.17597922407733474, 'r': 0.13409987180914382}, 'rouge-l': {'f': 0.3375112445140239, 'p': 0.44056842796025036, 'r': 0.29934067025889066}} 2020-09-13 00:14:56 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 5 @ 910 updates, score 6.283) (writing took 8.922468357719481 seconds) Test on testing set: Test {'rouge-1': {'f': 0.32195108164976866, 'p': 0.3986055423428635, 'r': 0.3072986476227612}, 'rouge-2': {'f': 0.13255688382054837, 'p': 0.16593460499965418, 'r': 0.12889569667431885}, 'rouge-l': {'f': 0.3336251633595829, 'p': 0.4308272105408829, 'r': 0.2980551182792911}} 2020-09-13 00:19:12 | INFO | train | {"epoch": 6, "train_loss": "6.299", "train_nll_loss": "4.79", "train_ppl": "27.674", "train_wps": "1144.9", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1092", "train_lr": "1.81568e-05", "train_gnorm": "4.966", "train_clip": "25.8", "train_oom": "0", "train_train_wall": "173", "train_wall": "1824"} 2020-09-13 00:19:16 | INFO | valid | {"epoch": 6, "valid_loss": "6.073", "valid_nll_loss": "4.4", "valid_ppl": "21.112", "valid_wps": "5230.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1092", "valid_best_loss": "6.073"} here bpe NONE here! Val {'rouge-1': {'f': 0.3334118320946748, 'p': 0.4410821539215469, 'r': 0.29588426535787143}, 'rouge-2': {'f': 0.13739283912842795, 'p': 0.18316512984611386, 'r': 0.12292501717355879}, 'rouge-l': {'f': 0.3367563539841945, 'p': 0.45088752831270734, 'r': 0.2903563191983348}} 2020-09-13 00:20:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 6 @ 1092 updates, score 6.073) (writing took 8.304749015718699 seconds) Test on testing set: Test {'rouge-1': {'f': 0.32018946591540753, 'p': 0.43157002485898993, 'r': 0.2830485811595604}, 'rouge-2': {'f': 0.127293851871242, 'p': 0.17473306979593142, 'r': 0.11274861530450966}, 'rouge-l': {'f': 0.3211206923058116, 'p': 0.43612260347418846, 'r': 0.2759708589901207}} 2020-09-13 00:24:22 | INFO | train | {"epoch": 7, "train_loss": "6.062", "train_nll_loss": "4.522", "train_ppl": "22.981", "train_wps": "1246.6", "train_ups": "0.59", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1274", "train_lr": "1.681e-05", "train_gnorm": "3.981", "train_clip": "13.2", "train_oom": "0", "train_train_wall": "171", "train_wall": "2134"} 2020-09-13 00:24:27 | INFO | valid | {"epoch": 7, "valid_loss": "5.974", "valid_nll_loss": "4.283", "valid_ppl": "19.47", "valid_wps": "4940.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1274", "valid_best_loss": "5.974"} here bpe NONE here! Val {'rouge-1': {'f': 0.34953445294994195, 'p': 0.45927096966662395, 'r': 0.3099192101544123}, 'rouge-2': {'f': 0.15042073792671246, 'p': 0.1998221241410976, 'r': 0.1338806674885114}, 'rouge-l': {'f': 0.3493160643305676, 'p': 0.4626922715315685, 'r': 0.3021469814220278}} 2020-09-13 00:25:40 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 7 @ 1274 updates, score 5.974) (writing took 9.386484532617033 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3454603555704317, 'p': 0.45500522729028614, 'r': 0.307872752358408}, 'rouge-2': {'f': 0.14921191873819553, 'p': 0.20035740171083352, 'r': 0.13298830103947873}, 'rouge-l': {'f': 0.35050286922913615, 'p': 0.46836836790711445, 'r': 0.30264623667919816}} 2020-09-13 00:29:36 | INFO | train | {"epoch": 8, "train_loss": "5.833", "train_nll_loss": "4.263", "train_ppl": "19.199", "train_wps": "1234.4", "train_ups": "0.58", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1456", "train_lr": "1.57243e-05", "train_gnorm": "3.139", "train_clip": "1.6", "train_oom": "0", "train_train_wall": "170", "train_wall": "2448"} 2020-09-13 00:29:40 | INFO | valid | {"epoch": 8, "valid_loss": "5.84", "valid_nll_loss": "4.149", "valid_ppl": "17.745", "valid_wps": "5219.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1456", "valid_best_loss": "5.84"} here bpe NONE here! Val {'rouge-1': {'f': 0.37134382327991744, 'p': 0.4483338326050663, 'r': 0.3499853577404138}, 'rouge-2': {'f': 0.16603384362767615, 'p': 0.19866856937856062, 'r': 0.15915049114742058}, 'rouge-l': {'f': 0.37879774030543395, 'p': 0.4653144943165627, 'r': 0.34473968051528203}} 2020-09-13 00:30:53 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 8 @ 1456 updates, score 5.84) (writing took 9.01857946626842 seconds) Test on testing set: Test {'rouge-1': {'f': 0.36901160113448783, 'p': 0.44108292289748047, 'r': 0.3526763811870377}, 'rouge-2': {'f': 0.16666756440683705, 'p': 0.19984167677488363, 'r': 0.1618494645628281}, 'rouge-l': {'f': 0.3778187479842473, 'p': 0.4602486552120849, 'r': 0.3473739981188793}} 2020-09-13 00:35:03 | INFO | train | {"epoch": 9, "train_loss": "5.66", "train_nll_loss": "4.064", "train_ppl": "16.729", "train_wps": "1184", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1638", "train_lr": "1.4825e-05", "train_gnorm": "2.92", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "174", "train_wall": "2776"} 2020-09-13 00:35:08 | INFO | valid | {"epoch": 9, "valid_loss": "5.799", "valid_nll_loss": "4.105", "valid_ppl": "17.213", "valid_wps": "5105", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1638", "valid_best_loss": "5.799"} here bpe NONE here! Val {'rouge-1': {'f': 0.36842622898234595, 'p': 0.44961516033632526, 'r': 0.34284042718663166}, 'rouge-2': {'f': 0.16566291407189612, 'p': 0.20170401529199547, 'r': 0.15595426964593812}, 'rouge-l': {'f': 0.37329684998556206, 'p': 0.46107006212992346, 'r': 0.3381812931818861}} 2020-09-13 00:36:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 9 @ 1638 updates, score 5.799) (writing took 8.725783603265882 seconds) Test on testing set: Test {'rouge-1': {'f': 0.36854854367280065, 'p': 0.446462179037044, 'r': 0.3453575409922443}, 'rouge-2': {'f': 0.15766752097723266, 'p': 0.19205310732297123, 'r': 0.14911261178538193}, 'rouge-l': {'f': 0.3723420714636821, 'p': 0.45483385435551693, 'r': 0.33894738585564493}} 2020-09-13 00:40:36 | INFO | train | {"epoch": 10, "train_loss": "5.499", "train_nll_loss": "3.88", "train_ppl": "14.719", "train_wps": "1164.2", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1820", "train_lr": "1.40642e-05", "train_gnorm": "2.644", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "171", "train_wall": "3108"} 2020-09-13 00:40:41 | INFO | valid | {"epoch": 10, "valid_loss": "5.764", "valid_nll_loss": "4.061", "valid_ppl": "16.685", "valid_wps": "4140", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1820", "valid_best_loss": "5.764"} here bpe NONE here! Val {'rouge-1': {'f': 0.3732009581077997, 'p': 0.45658435160220856, 'r': 0.34843650153963096}, 'rouge-2': {'f': 0.16955068859082412, 'p': 0.20811159765922504, 'r': 0.16092543242290838}, 'rouge-l': {'f': 0.37735274143117503, 'p': 0.46684437321975286, 'r': 0.34234094727570397}} 2020-09-13 00:41:52 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 10 @ 1820 updates, score 5.764) (writing took 9.631018000654876 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3716137675352023, 'p': 0.45290106682582165, 'r': 0.350055845189764}, 'rouge-2': {'f': 0.16761863723149356, 'p': 0.2054667073198634, 'r': 0.159476663711512}, 'rouge-l': {'f': 0.3766825457695764, 'p': 0.46341365477026236, 'r': 0.3429991475782948}} 2020-09-13 00:45:59 | INFO | train | {"epoch": 11, "train_loss": "5.389", "train_nll_loss": "3.752", "train_ppl": "13.477", "train_wps": "1200", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2002", "train_lr": "1.34097e-05", "train_gnorm": "2.583", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "3431"} 2020-09-13 00:46:02 | INFO | valid | {"epoch": 11, "valid_loss": "5.732", "valid_nll_loss": "4.006", "valid_ppl": "16.062", "valid_wps": "5937", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2002", "valid_best_loss": "5.732"} here bpe NONE here! Val {'rouge-1': {'f': 0.3712171758030596, 'p': 0.46406767790531694, 'r': 0.3399832503572037}, 'rouge-2': {'f': 0.17340701068522252, 'p': 0.218702325075554, 'r': 0.15972205472044512}, 'rouge-l': {'f': 0.37458693081114397, 'p': 0.47097623661393523, 'r': 0.3349146889430698}} 2020-09-13 00:47:13 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 11 @ 2002 updates, score 5.732) (writing took 8.599667175672948 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3759528658322778, 'p': 0.4600031515568631, 'r': 0.34994197804675164}, 'rouge-2': {'f': 0.16784789754607138, 'p': 0.2078567819662484, 'r': 0.15733512603695773}, 'rouge-l': {'f': 0.37885456493676783, 'p': 0.4665253560189782, 'r': 0.34335369752238043}} 2020-09-13 00:51:19 | INFO | train | {"epoch": 12, "train_loss": "5.266", "train_nll_loss": "3.61", "train_ppl": "12.21", "train_wps": "1211.6", "train_ups": "0.57", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2184", "train_lr": "1.28388e-05", "train_gnorm": "2.559", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "3751"} 2020-09-13 00:51:22 | INFO | valid | {"epoch": 12, "valid_loss": "5.695", "valid_nll_loss": "3.958", "valid_ppl": "15.541", "valid_wps": "5973.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2184", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.37738179617980694, 'p': 0.45113007243822967, 'r': 0.35812026539253183}, 'rouge-2': {'f': 0.174381935944973, 'p': 0.20928232150430362, 'r': 0.1667457526834203}, 'rouge-l': {'f': 0.381769707403124, 'p': 0.4607884172272312, 'r': 0.3523183895063523}} 2020-09-13 00:52:37 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 12 @ 2184 updates, score 5.695) (writing took 8.865372630767524 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37599974934852276, 'p': 0.44884508733258244, 'r': 0.3585387811436595}, 'rouge-2': {'f': 0.16874446443891233, 'p': 0.20483613250924657, 'r': 0.1618007263923004}, 'rouge-l': {'f': 0.3813476499552326, 'p': 0.4596740958090682, 'r': 0.35183457630713216}} 2020-09-13 00:56:48 | INFO | train | {"epoch": 13, "train_loss": "5.158", "train_nll_loss": "3.483", "train_ppl": "11.182", "train_wps": "1176.3", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2366", "train_lr": "1.23351e-05", "train_gnorm": "2.471", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "4080"} 2020-09-13 00:56:53 | INFO | valid | {"epoch": 13, "valid_loss": "5.712", "valid_nll_loss": "3.969", "valid_ppl": "15.661", "valid_wps": "4424.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2366", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3884982705551785, 'p': 0.4491673629778516, 'r': 0.37780412740852076}, 'rouge-2': {'f': 0.17685692236323902, 'p': 0.2042786532892362, 'r': 0.17430026356282213}, 'rouge-l': {'f': 0.3935735209373062, 'p': 0.4617984008105712, 'r': 0.37031878737194673}} 2020-09-13 00:58:10 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 13 @ 2366 updates, score 5.712) (writing took 4.134681691415608 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37886357099711665, 'p': 0.43458029252143165, 'r': 0.3729596920501715}, 'rouge-2': {'f': 0.16854719468745627, 'p': 0.19559694299078864, 'r': 0.1669496828829494}, 'rouge-l': {'f': 0.38364553730125733, 'p': 0.44831425303180794, 'r': 0.3636603790462315}} 2020-09-13 01:02:30 | INFO | train | {"epoch": 14, "train_loss": "5.054", "train_nll_loss": "3.361", "train_ppl": "10.275", "train_wps": "1131.2", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2548", "train_lr": "1.18864e-05", "train_gnorm": "2.442", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "4423"} 2020-09-13 01:02:35 | INFO | valid | {"epoch": 14, "valid_loss": "5.707", "valid_nll_loss": "3.958", "valid_ppl": "15.544", "valid_wps": "4832.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2548", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3998796789622521, 'p': 0.44386233353207316, 'r': 0.402440874022752}, 'rouge-2': {'f': 0.18989748323963274, 'p': 0.20863603806938805, 'r': 0.1955903717688269}, 'rouge-l': {'f': 0.4074896840434411, 'p': 0.4639988402234145, 'r': 0.3924353598143554}} 2020-09-13 01:03:55 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 14 @ 2548 updates, score 5.707) (writing took 4.522745947353542 seconds) Test on testing set: Test {'rouge-1': {'f': 0.389306128602423, 'p': 0.43236154677359795, 'r': 0.3928469004687963}, 'rouge-2': {'f': 0.1749124813810089, 'p': 0.19233446115494, 'r': 0.18101805932331297}, 'rouge-l': {'f': 0.3974381446141553, 'p': 0.45394463380960515, 'r': 0.3816860716155822}} 2020-09-13 01:08:15 | INFO | train | {"epoch": 15, "train_loss": "4.955", "train_nll_loss": "3.246", "train_ppl": "9.486", "train_wps": "1124.8", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2730", "train_lr": "1.14834e-05", "train_gnorm": "2.476", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "4767"} 2020-09-13 01:08:19 | INFO | valid | {"epoch": 15, "valid_loss": "5.72", "valid_nll_loss": "3.969", "valid_ppl": "15.664", "valid_wps": "5135.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2730", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.385496212805097, 'p': 0.45344757978858063, 'r': 0.37208548238317946}, 'rouge-2': {'f': 0.17987898260277432, 'p': 0.2117429818111738, 'r': 0.17606453929508845}, 'rouge-l': {'f': 0.3877792425552155, 'p': 0.45854887620658535, 'r': 0.3646117908261955}} 2020-09-13 01:09:32 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 15 @ 2730 updates, score 5.72) (writing took 4.130691207945347 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37790240747430026, 'p': 0.43971254753318373, 'r': 0.367746760947724}, 'rouge-2': {'f': 0.16425074216297936, 'p': 0.19264495690629893, 'r': 0.16212163303873436}, 'rouge-l': {'f': 0.38059486158567596, 'p': 0.4492443951232903, 'r': 0.35745930848360596}} 2020-09-13 01:13:41 | INFO | train | {"epoch": 16, "train_loss": "4.867", "train_nll_loss": "3.141", "train_ppl": "8.82", "train_wps": "1187.1", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2912", "train_lr": "1.11187e-05", "train_gnorm": "2.448", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "5093"} 2020-09-13 01:13:45 | INFO | valid | {"epoch": 16, "valid_loss": "5.701", "valid_nll_loss": "3.929", "valid_ppl": "15.235", "valid_wps": "6218.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2912", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38106680686091043, 'p': 0.4805848063501437, 'r': 0.3447942668199565}, 'rouge-2': {'f': 0.175794098245, 'p': 0.22236168866719144, 'r': 0.16057201572100432}, 'rouge-l': {'f': 0.3806436944039715, 'p': 0.4789247600612308, 'r': 0.3390977899415061}} 2020-09-13 01:14:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 16 @ 2912 updates, score 5.701) (writing took 4.02212131395936 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3718838641380148, 'p': 0.4666792110932706, 'r': 0.3399888800206822}, 'rouge-2': {'f': 0.1671697426502001, 'p': 0.21140715464498486, 'r': 0.1542190531743976}, 'rouge-l': {'f': 0.3728099296900896, 'p': 0.46619239167445015, 'r': 0.3346634641171861}} 2020-09-13 01:18:54 | INFO | train | {"epoch": 17, "train_loss": "4.782", "train_nll_loss": "3.04", "train_ppl": "8.228", "train_wps": "1238.2", "train_ups": "0.58", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3094", "train_lr": "1.07868e-05", "train_gnorm": "2.405", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "5406"} 2020-09-13 01:18:58 | INFO | valid | {"epoch": 17, "valid_loss": "5.733", "valid_nll_loss": "3.974", "valid_ppl": "15.715", "valid_wps": "5150", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3094", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.391276670828434, 'p': 0.43821071505750875, 'r': 0.3908326539960555}, 'rouge-2': {'f': 0.17966333440452606, 'p': 0.2006399715420435, 'r': 0.18217533769836514}, 'rouge-l': {'f': 0.3948289271214795, 'p': 0.4524492153099438, 'r': 0.3798363085924027}} 2020-09-13 01:20:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 17 @ 3094 updates, score 5.733) (writing took 4.8997919876128435 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3845810681201655, 'p': 0.4282195186242171, 'r': 0.3862897923232961}, 'rouge-2': {'f': 0.17073755266846638, 'p': 0.1894686524531282, 'r': 0.17556790042355605}, 'rouge-l': {'f': 0.391043491686661, 'p': 0.4431799101178343, 'r': 0.3779568118927497}} 2020-09-13 01:24:28 | INFO | train | {"epoch": 18, "train_loss": "4.697", "train_nll_loss": "2.94", "train_ppl": "7.672", "train_wps": "1159.4", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3276", "train_lr": "1.04828e-05", "train_gnorm": "2.467", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "5740"} 2020-09-13 01:24:32 | INFO | valid | {"epoch": 18, "valid_loss": "5.724", "valid_nll_loss": "3.944", "valid_ppl": "15.396", "valid_wps": "5295.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3276", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.37861704558770404, 'p': 0.4419875194307351, 'r': 0.3667373863426889}, 'rouge-2': {'f': 0.1726066525356969, 'p': 0.20114080833632547, 'r': 0.1692294466791264}, 'rouge-l': {'f': 0.3817555654265253, 'p': 0.4502364007470248, 'r': 0.3588787943829516}} 2020-09-13 01:25:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 18 @ 3276 updates, score 5.724) (writing took 5.381191832944751 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3723413859817262, 'p': 0.4282642263502103, 'r': 0.36388886661332226}, 'rouge-2': {'f': 0.15908020930468886, 'p': 0.18290676480735674, 'r': 0.15773721634104743}, 'rouge-l': {'f': 0.3772182373129486, 'p': 0.4386399747551869, 'r': 0.35658496212946833}} 2020-09-13 01:29:54 | INFO | train | {"epoch": 19, "train_loss": "4.614", "train_nll_loss": "2.841", "train_ppl": "7.163", "train_wps": "1189.8", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3458", "train_lr": "1.02033e-05", "train_gnorm": "2.388", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "6066"} 2020-09-13 01:29:59 | INFO | valid | {"epoch": 19, "valid_loss": "5.742", "valid_nll_loss": "3.966", "valid_ppl": "15.631", "valid_wps": "3865.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3458", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38044592866113847, 'p': 0.46748203688066003, 'r': 0.355039752203741}, 'rouge-2': {'f': 0.17804838492656566, 'p': 0.21772699643126558, 'r': 0.16901485873684227}, 'rouge-l': {'f': 0.3813148465803212, 'p': 0.46892094755586644, 'r': 0.3488118629587613}} 2020-09-13 01:31:08 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 19 @ 3458 updates, score 5.742) (writing took 4.588620846159756 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3701339419496253, 'p': 0.45463960006384346, 'r': 0.3417336798726597}, 'rouge-2': {'f': 0.16203221978539503, 'p': 0.20101164160722118, 'r': 0.15035608008610543}, 'rouge-l': {'f': 0.3689392742120454, 'p': 0.4531052832313649, 'r': 0.3346368169251904}} 2020-09-13 01:35:14 | INFO | train | {"epoch": 20, "train_loss": "4.54", "train_nll_loss": "2.754", "train_ppl": "6.744", "train_wps": "1207.7", "train_ups": "0.57", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3640", "train_lr": "9.9449e-06", "train_gnorm": "2.43", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "6387"} 2020-09-13 01:35:19 | INFO | valid | {"epoch": 20, "valid_loss": "5.765", "valid_nll_loss": "3.981", "valid_ppl": "15.794", "valid_wps": "5212", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3640", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38917925586628854, 'p': 0.4545888729297281, 'r': 0.37521025150304493}, 'rouge-2': {'f': 0.18128938136060216, 'p': 0.2108173460566329, 'r': 0.1779577175289945}, 'rouge-l': {'f': 0.3912123475237053, 'p': 0.4626659366951105, 'r': 0.36707352170149965}} 2020-09-13 01:36:31 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 20 @ 3640 updates, score 5.765) (writing took 4.491002192720771 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37944542213327404, 'p': 0.4419097225174281, 'r': 0.36700773898625333}, 'rouge-2': {'f': 0.16679895064998046, 'p': 0.19638910895842057, 'r': 0.16331358354493733}, 'rouge-l': {'f': 0.3822408225204509, 'p': 0.4509045323855888, 'r': 0.3583453351231742}} 2020-09-13 01:40:40 | INFO | train | {"epoch": 21, "train_loss": "4.459", "train_nll_loss": "2.656", "train_ppl": "6.305", "train_wps": "1190.5", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3822", "train_lr": "9.70523e-06", "train_gnorm": "2.442", "train_clip": "0", "train_oom": "0", "train_train_wall": "168", "train_wall": "6712"} 2020-09-13 01:40:44 | INFO | valid | {"epoch": 21, "valid_loss": "5.801", "valid_nll_loss": "4.013", "valid_ppl": "16.145", "valid_wps": "5059.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3822", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3951976657153381, 'p': 0.4313811856628662, 'r': 0.40212658363768433}, 'rouge-2': {'f': 0.18158599410484844, 'p': 0.19714233842488113, 'r': 0.1880275069846484}, 'rouge-l': {'f': 0.400753788566494, 'p': 0.4453694229577254, 'r': 0.39373391050291107}} 2020-09-13 01:42:04 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 21 @ 3822 updates, score 5.801) (writing took 4.3676733430475 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37992031392670983, 'p': 0.4103137483489205, 'r': 0.39027553986147556}, 'rouge-2': {'f': 0.16588326288147193, 'p': 0.1786441782759031, 'r': 0.17327696142366736}, 'rouge-l': {'f': 0.3866994768252795, 'p': 0.4268482135450003, 'r': 0.38055574575484663}} 2020-09-13 01:46:22 | INFO | train | {"epoch": 22, "train_loss": "4.388", "train_nll_loss": "2.573", "train_ppl": "5.95", "train_wps": "1132.6", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4004", "train_lr": "9.48209e-06", "train_gnorm": "2.438", "train_clip": "0", "train_oom": "0", "train_train_wall": "169", "train_wall": "7054"} 2020-09-13 01:46:27 | INFO | valid | {"epoch": 22, "valid_loss": "5.806", "valid_nll_loss": "4.013", "valid_ppl": "16.15", "valid_wps": "4606.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4004", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38750722254797787, 'p': 0.4277637839583848, 'r': 0.3917518647792673}, 'rouge-2': {'f': 0.17892783275537408, 'p': 0.1954041191558916, 'r': 0.18563060651094296}, 'rouge-l': {'f': 0.3901464960798033, 'p': 0.4382336878278232, 'r': 0.38022299917227464}} 2020-09-13 01:47:44 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 22 @ 4004 updates, score 5.806) (writing took 4.484134818427265 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37365247137031243, 'p': 0.4107872063220731, 'r': 0.37916166737146184}, 'rouge-2': {'f': 0.1645194217180295, 'p': 0.18036856200061924, 'r': 0.17038522739655573}, 'rouge-l': {'f': 0.3788353883034097, 'p': 0.42472162650174466, 'r': 0.36942716034722173}} 2020-09-13 01:51:58 | INFO | train | {"epoch": 23, "train_loss": "4.321", "train_nll_loss": "2.493", "train_ppl": "5.628", "train_wps": "1151.3", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4186", "train_lr": "9.27367e-06", "train_gnorm": "2.455", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "171", "train_wall": "7391"} 2020-09-13 01:52:03 | INFO | valid | {"epoch": 23, "valid_loss": "5.827", "valid_nll_loss": "4.033", "valid_ppl": "16.372", "valid_wps": "4875", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4186", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38822926461557655, 'p': 0.45348295623984464, 'r': 0.37020477945877106}, 'rouge-2': {'f': 0.1777558814272791, 'p': 0.20734387979453298, 'r': 0.17123390284814333}, 'rouge-l': {'f': 0.3870378946705613, 'p': 0.453376818512756, 'r': 0.3616563152380114}} 2020-09-13 01:53:12 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 23 @ 4186 updates, score 5.827) (writing took 4.091774018481374 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3752148029208675, 'p': 0.43863114918235374, 'r': 0.358486619417161}, 'rouge-2': {'f': 0.16180559507864367, 'p': 0.19086761931101376, 'r': 0.1558113872763702}, 'rouge-l': {'f': 0.3749091393825165, 'p': 0.44034156816542647, 'r': 0.3498163907525991}} 2020-09-13 01:57:21 | INFO | train | {"epoch": 24, "train_loss": "4.255", "train_nll_loss": "2.414", "train_ppl": "5.331", "train_wps": "1200.3", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4368", "train_lr": "9.07841e-06", "train_gnorm": "2.447", "train_clip": "0", "train_oom": "0", "train_train_wall": "170", "train_wall": "7713"} 2020-09-13 01:57:25 | INFO | valid | {"epoch": 24, "valid_loss": "5.827", "valid_nll_loss": "4.02", "valid_ppl": "16.219", "valid_wps": "5117.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4368", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38534117547261393, 'p': 0.445273298289193, 'r': 0.3742189594129202}, 'rouge-2': {'f': 0.17705063825525105, 'p': 0.20385055188358236, 'r': 0.17461070647467383}, 'rouge-l': {'f': 0.38760590954771934, 'p': 0.45168003704103943, 'r': 0.36699590315532576}} 2020-09-13 01:58:41 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 24 @ 4368 updates, score 5.827) (writing took 4.580549734644592 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37376658505892546, 'p': 0.42513689289838574, 'r': 0.36824430078261083}, 'rouge-2': {'f': 0.16449381722326553, 'p': 0.18792195589835636, 'r': 0.16472545042139783}, 'rouge-l': {'f': 0.37640842528791896, 'p': 0.4302305671551292, 'r': 0.36077114610382716}} 2020-09-13 02:02:58 | INFO | train | {"epoch": 25, "train_loss": "4.189", "train_nll_loss": "2.336", "train_ppl": "5.049", "train_wps": "1148.2", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4550", "train_lr": "8.89499e-06", "train_gnorm": "2.448", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "8051"} 2020-09-13 02:03:03 | INFO | valid | {"epoch": 25, "valid_loss": "5.868", "valid_nll_loss": "4.082", "valid_ppl": "16.938", "valid_wps": "5108.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4550", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3871704147563257, 'p': 0.42776014371986826, 'r': 0.39233499649709835}, 'rouge-2': {'f': 0.1735141403257615, 'p': 0.18983819963280862, 'r': 0.1801092131406639}, 'rouge-l': {'f': 0.38881395036855076, 'p': 0.4373233829107774, 'r': 0.37913972883849767}} 2020-09-13 02:04:26 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 25 @ 4550 updates, score 5.868) (writing took 4.335143620148301 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3844215514195955, 'p': 0.42117882229965664, 'r': 0.38836392231731154}, 'rouge-2': {'f': 0.16712838914507203, 'p': 0.184210597325008, 'r': 0.16991914128454474}, 'rouge-l': {'f': 0.38612055818280827, 'p': 0.43107652154520104, 'r': 0.37593315070715955}} 2020-09-13 02:08:44 | INFO | train | {"epoch": 26, "train_loss": "4.123", "train_nll_loss": "2.257", "train_ppl": "4.78", "train_wps": "1122.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4732", "train_lr": "8.72226e-06", "train_gnorm": "2.424", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "8396"} 2020-09-13 02:08:48 | INFO | valid | {"epoch": 26, "valid_loss": "5.882", "valid_nll_loss": "4.086", "valid_ppl": "16.987", "valid_wps": "5032.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4732", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.39059463228636443, 'p': 0.44845589456484236, 'r': 0.37968675708755983}, 'rouge-2': {'f': 0.17511335211715656, 'p': 0.19906234650600044, 'r': 0.1739893469380071}, 'rouge-l': {'f': 0.3879455405786022, 'p': 0.4496592808236638, 'r': 0.3675400057841032}} 2020-09-13 02:10:03 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 26 @ 4732 updates, score 5.882) (writing took 4.394494019448757 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37693971962065465, 'p': 0.43127100071839813, 'r': 0.37002592342998164}, 'rouge-2': {'f': 0.1637296222980477, 'p': 0.187921960463544, 'r': 0.16291519359218756}, 'rouge-l': {'f': 0.37658267706833715, 'p': 0.43598237022943104, 'r': 0.3580529888703775}} 2020-09-13 02:14:15 | INFO | train | {"epoch": 27, "train_loss": "4.067", "train_nll_loss": "2.192", "train_ppl": "4.569", "train_wps": "1170.3", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4914", "train_lr": "8.55921e-06", "train_gnorm": "2.464", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "8727"} 2020-09-13 02:14:19 | INFO | valid | {"epoch": 27, "valid_loss": "5.915", "valid_nll_loss": "4.113", "valid_ppl": "17.299", "valid_wps": "5272.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4914", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38434759891377074, 'p': 0.4249833165393893, 'r': 0.38662224677719503}, 'rouge-2': {'f': 0.17192755651144725, 'p': 0.19044749867481733, 'r': 0.1749212887993873}, 'rouge-l': {'f': 0.38774917991430324, 'p': 0.4365569133041221, 'r': 0.37632971013264493}} 2020-09-13 02:15:35 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 27 @ 4914 updates, score 5.915) (writing took 4.622201276943088 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37903839501209047, 'p': 0.4130527827785208, 'r': 0.38674322772887626}, 'rouge-2': {'f': 0.16408303228928034, 'p': 0.17857210979076796, 'r': 0.169772298030173}, 'rouge-l': {'f': 0.3828725069419752, 'p': 0.42540750664994087, 'r': 0.3750088333333603}} 2020-09-13 02:19:56 | INFO | train | {"epoch": 28, "train_loss": "4.005", "train_nll_loss": "2.117", "train_ppl": "4.339", "train_wps": "1133.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5096", "train_lr": "8.40498e-06", "train_gnorm": "2.417", "train_clip": "0", "train_oom": "0", "train_train_wall": "175", "train_wall": "9069"} 2020-09-13 02:20:01 | INFO | valid | {"epoch": 28, "valid_loss": "5.924", "valid_nll_loss": "4.113", "valid_ppl": "17.301", "valid_wps": "5072.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5096", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3789412021753395, 'p': 0.44009502943218975, 'r': 0.3673566576087134}, 'rouge-2': {'f': 0.172794589968652, 'p': 0.19937717288075993, 'r': 0.17077481830782684}, 'rouge-l': {'f': 0.37829677515222787, 'p': 0.44274850813900934, 'r': 0.3577681743468191}} 2020-09-13 02:21:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 28 @ 5096 updates, score 5.924) (writing took 4.540925501845777 seconds) Test on testing set: Test {'rouge-1': {'f': 0.36527019423752055, 'p': 0.4208007754021258, 'r': 0.35525130146740025}, 'rouge-2': {'f': 0.1572877265691285, 'p': 0.1817244103207206, 'r': 0.15514638974429046}, 'rouge-l': {'f': 0.36299730631095184, 'p': 0.42155101744468565, 'r': 0.34371323149546557}} 2020-09-13 02:25:31 | INFO | train | {"epoch": 29, "train_loss": "3.946", "train_nll_loss": "2.047", "train_ppl": "4.134", "train_wps": "1157.1", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5278", "train_lr": "8.25879e-06", "train_gnorm": "2.41", "train_clip": "0", "train_oom": "0", "train_train_wall": "176", "train_wall": "9403"} 2020-09-13 02:25:35 | INFO | valid | {"epoch": 29, "valid_loss": "5.948", "valid_nll_loss": "4.141", "valid_ppl": "17.646", "valid_wps": "5013.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5278", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.39392286385154907, 'p': 0.44045543067367227, 'r': 0.39267143683548494}, 'rouge-2': {'f': 0.17773576677957223, 'p': 0.1967579777648653, 'r': 0.18110397051020538}, 'rouge-l': {'f': 0.3941220431573893, 'p': 0.44562952698546576, 'r': 0.3812440137558409}} 2020-09-13 02:26:56 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 29 @ 5278 updates, score 5.948) (writing took 4.303143389523029 seconds) Test on testing set: Test {'rouge-1': {'f': 0.38080278641895143, 'p': 0.4233969204218693, 'r': 0.3843529679579355}, 'rouge-2': {'f': 0.1626977341974013, 'p': 0.18099048848906363, 'r': 0.16602780449388885}, 'rouge-l': {'f': 0.38036795751367164, 'p': 0.43089260804423674, 'r': 0.369552310419994}} 2020-09-13 02:31:11 | INFO | train | {"epoch": 30, "train_loss": "3.895", "train_nll_loss": "1.988", "train_ppl": "3.966", "train_wps": "1139.3", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5460", "train_lr": "8.11998e-06", "train_gnorm": "2.409", "train_clip": "0", "train_oom": "0", "train_train_wall": "170", "train_wall": "9743"} 2020-09-13 02:31:16 | INFO | valid | {"epoch": 30, "valid_loss": "5.981", "valid_nll_loss": "4.171", "valid_ppl": "18.011", "valid_wps": "4936.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5460", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38020709374000805, 'p': 0.425796857846852, 'r': 0.37872173949599347}, 'rouge-2': {'f': 0.16795702651937106, 'p': 0.18762125339060082, 'r': 0.17083357743421537}, 'rouge-l': {'f': 0.38090418992038316, 'p': 0.4310409715465135, 'r': 0.36867556519723077}} 2020-09-13 02:32:29 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 30 @ 5460 updates, score 5.981) (writing took 4.3459603087976575 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37965358788271986, 'p': 0.4206577074238292, 'r': 0.380269873971165}, 'rouge-2': {'f': 0.16305562269169338, 'p': 0.18144303909345297, 'r': 0.16565712858345996}, 'rouge-l': {'f': 0.3809386293041335, 'p': 0.4267185372492751, 'r': 0.3700918835000441}} 2020-09-13 02:36:41 | INFO | train | {"epoch": 31, "train_loss": "3.83", "train_nll_loss": "1.911", "train_ppl": "3.76", "train_wps": "1174.4", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5642", "train_lr": "7.98794e-06", "train_gnorm": "2.399", "train_clip": "0", "train_oom": "0", "train_train_wall": "169", "train_wall": "10073"} 2020-09-13 02:36:47 | INFO | valid | {"epoch": 31, "valid_loss": "5.991", "valid_nll_loss": "4.188", "valid_ppl": "18.225", "valid_wps": "3896.5", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5642", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38086083577746893, 'p': 0.43856192527978594, 'r': 0.36852514930253355}, 'rouge-2': {'f': 0.1720548694357604, 'p': 0.19719350666715524, 'r': 0.1688531110170121}, 'rouge-l': {'f': 0.37940756909649354, 'p': 0.43991824611795655, 'r': 0.3593013025166184}} 2020-09-13 02:38:05 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 31 @ 5642 updates, score 5.991) (writing took 4.520576075650752 seconds) Test on testing set: Test {'rouge-1': {'f': 0.370679717039701, 'p': 0.4252353421116953, 'r': 0.362809943313995}, 'rouge-2': {'f': 0.15675368799422942, 'p': 0.18101486674427333, 'r': 0.15407280461847872}, 'rouge-l': {'f': 0.3695017426852557, 'p': 0.4269730776434452, 'r': 0.3521532197453053}} 2020-09-13 02:42:13 | INFO | train | {"epoch": 32, "train_loss": "3.782", "train_nll_loss": "1.855", "train_ppl": "3.617", "train_wps": "1166.1", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5824", "train_lr": "7.86214e-06", "train_gnorm": "2.394", "train_clip": "0", "train_oom": "0", "train_train_wall": "166", "train_wall": "10405"} 2020-09-13 02:42:17 | INFO | valid | {"epoch": 32, "valid_loss": "6", "valid_nll_loss": "4.194", "valid_ppl": "18.306", "valid_wps": "5229.5", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5824", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38137652179916004, 'p': 0.43313790455810236, 'r': 0.3772229819910243}, 'rouge-2': {'f': 0.16691298025795592, 'p': 0.18836874967588021, 'r': 0.16863772276161515}, 'rouge-l': {'f': 0.38013909271578145, 'p': 0.4366292868501368, 'r': 0.3653993312593402}} 2020-09-13 02:43:34 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 32 @ 5824 updates, score 6.0) (writing took 4.351189863868058 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37981067693153064, 'p': 0.42971224310521866, 'r': 0.37687873722717913}, 'rouge-2': {'f': 0.15992774092760015, 'p': 0.18342171362162557, 'r': 0.15837989344554051}, 'rouge-l': {'f': 0.3799422632878725, 'p': 0.43464593193037043, 'r': 0.36586217115314146}} 2020-09-13 02:47:51 | INFO | train | {"epoch": 33, "train_loss": "3.726", "train_nll_loss": "1.789", "train_ppl": "3.455", "train_wps": "1147.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6006", "train_lr": "7.7421e-06", "train_gnorm": "2.331", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "10743"} 2020-09-13 02:47:55 | INFO | valid | {"epoch": 33, "valid_loss": "6.02", "valid_nll_loss": "4.212", "valid_ppl": "18.537", "valid_wps": "5786.6", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6006", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.37915361599396163, 'p': 0.4177151715958892, 'r': 0.38225942974637367}, 'rouge-2': {'f': 0.16823645287535424, 'p': 0.18276700397376724, 'r': 0.17357744357848057}, 'rouge-l': {'f': 0.38136667391475326, 'p': 0.4250451610297481, 'r': 0.3732515439655231}} 2020-09-13 02:49:11 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 33 @ 6006 updates, score 6.02) (writing took 4.523772260174155 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3718742933369427, 'p': 0.4099806920967013, 'r': 0.3767762931178083}, 'rouge-2': {'f': 0.15370868260410603, 'p': 0.17047928337500037, 'r': 0.15684602569324277}, 'rouge-l': {'f': 0.3688537640181489, 'p': 0.41053246892705925, 'r': 0.3627296345795084}} 2020-09-13 02:53:29 | INFO | train | {"epoch": 34, "train_loss": "3.669", "train_nll_loss": "1.723", "train_ppl": "3.3", "train_wps": "1146.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6188", "train_lr": "7.62739e-06", "train_gnorm": "2.323", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "11081"} 2020-09-13 02:53:33 | INFO | valid | {"epoch": 34, "valid_loss": "6.066", "valid_nll_loss": "4.269", "valid_ppl": "19.278", "valid_wps": "5174.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6188", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.387153766867382, 'p': 0.42792687135337976, 'r': 0.3892787185709572}, 'rouge-2': {'f': 0.17028365573313206, 'p': 0.1865320310783622, 'r': 0.17456691285290213}, 'rouge-l': {'f': 0.3850822596116743, 'p': 0.43329137654837147, 'r': 0.3746488459567868}} 2020-09-13 02:54:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 34 @ 6188 updates, score 6.066) (writing took 4.374015459790826 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37794691927940804, 'p': 0.4146940259079912, 'r': 0.3847413121249158}, 'rouge-2': {'f': 0.1627020618517039, 'p': 0.17883773270864975, 'r': 0.1677699635770461}, 'rouge-l': {'f': 0.38046893390803, 'p': 0.4244051828628842, 'r': 0.37318182299953945}} 2020-09-13 02:59:05 | INFO | train | {"epoch": 35, "train_loss": "3.621", "train_nll_loss": "1.667", "train_ppl": "3.175", "train_wps": "1152.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6370", "train_lr": "7.51764e-06", "train_gnorm": "2.332", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "11417"} 2020-09-13 02:59:10 | INFO | valid | {"epoch": 35, "valid_loss": "6.081", "valid_nll_loss": "4.28", "valid_ppl": "19.431", "valid_wps": "4026.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6370", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3863770618665606, 'p': 0.41960423765025195, 'r': 0.39353742674033426}, 'rouge-2': {'f': 0.16849963347107963, 'p': 0.18116140757543353, 'r': 0.1752595358690957}, 'rouge-l': {'f': 0.3868700708130419, 'p': 0.4284176759227474, 'r': 0.38066459517190543}} 2020-09-13 03:00:35 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 35 @ 6370 updates, score 6.081) (writing took 4.590174483135343 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3785464402442006, 'p': 0.4101153566021953, 'r': 0.3892325582359251}, 'rouge-2': {'f': 0.1635688031477682, 'p': 0.17727291271497997, 'r': 0.17079954619481266}, 'rouge-l': {'f': 0.38081945358634506, 'p': 0.4199370914132296, 'r': 0.37711290928707314}} 2020-09-13 03:04:57 | INFO | train | {"epoch": 36, "train_loss": "3.576", "train_nll_loss": "1.613", "train_ppl": "3.059", "train_wps": "1101.4", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6552", "train_lr": "7.41249e-06", "train_gnorm": "2.297", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "11769"} 2020-09-13 03:05:01 | INFO | valid | {"epoch": 36, "valid_loss": "6.083", "valid_nll_loss": "4.281", "valid_ppl": "19.438", "valid_wps": "4978.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6552", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38480255056640067, 'p': 0.41896340222175316, 'r': 0.3925201975546814}, 'rouge-2': {'f': 0.16801251321048902, 'p': 0.18062486712904227, 'r': 0.17507848305413856}, 'rouge-l': {'f': 0.382260686554629, 'p': 0.42213632057966827, 'r': 0.37822787809887537}} 2020-09-13 03:06:23 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 36 @ 6552 updates, score 6.083) (writing took 4.2699774550274014 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37602470396509274, 'p': 0.411255046807531, 'r': 0.38215708009749216}, 'rouge-2': {'f': 0.1597641638757478, 'p': 0.17617491833186072, 'r': 0.16325763230195292}, 'rouge-l': {'f': 0.37238202458966757, 'p': 0.41385589088931496, 'r': 0.36541661129126923}} 2020-09-13 03:10:43 | INFO | train | {"epoch": 37, "train_loss": "3.52", "train_nll_loss": "1.549", "train_ppl": "2.925", "train_wps": "1118.5", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6734", "train_lr": "7.31164e-06", "train_gnorm": "2.273", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "12115"} 2020-09-13 03:10:46 | INFO | valid | {"epoch": 37, "valid_loss": "6.094", "valid_nll_loss": "4.296", "valid_ppl": "19.645", "valid_wps": "6031.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6734", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38520799197303024, 'p': 0.4115952466989357, 'r': 0.401465322621547}, 'rouge-2': {'f': 0.16788612832087196, 'p': 0.17883063755349102, 'r': 0.1782279060988303}, 'rouge-l': {'f': 0.3861095337034907, 'p': 0.4201549510855608, 'r': 0.3877350945267522}} 2020-09-13 03:12:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 37 @ 6734 updates, score 6.094) (writing took 4.625352236442268 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3743512324371333, 'p': 0.3969204019868071, 'r': 0.3928996754216558}, 'rouge-2': {'f': 0.15513973954352014, 'p': 0.1640198600019053, 'r': 0.1656530221490409}, 'rouge-l': {'f': 0.3754077971732901, 'p': 0.4063686794601194, 'r': 0.3776842747295159}} 2020-09-13 03:16:43 | INFO | train | {"epoch": 38, "train_loss": "3.475", "train_nll_loss": "1.497", "train_ppl": "2.822", "train_wps": "1074.9", "train_ups": "0.51", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6916", "train_lr": "7.21479e-06", "train_gnorm": "2.282", "train_clip": "0", "train_oom": "0", "train_train_wall": "176", "train_wall": "12476"} 2020-09-13 03:16:48 | INFO | valid | {"epoch": 38, "valid_loss": "6.091", "valid_nll_loss": "4.292", "valid_ppl": "19.588", "valid_wps": "4328.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6916", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3906977486823824, 'p': 0.4322244472662039, 'r': 0.3904387899271124}, 'rouge-2': {'f': 0.17065932983497842, 'p': 0.18826351149371626, 'r': 0.17272125641072303}, 'rouge-l': {'f': 0.3853463228641199, 'p': 0.43086111541067484, 'r': 0.37498756649126375}} 2020-09-13 03:18:01 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 38 @ 6916 updates, score 6.091) (writing took 4.299341707490385 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3778648195003795, 'p': 0.4124395622043999, 'r': 0.38638326843135784}, 'rouge-2': {'f': 0.15725456356423087, 'p': 0.17166198095409285, 'r': 0.16285370981177014}, 'rouge-l': {'f': 0.3730423527751766, 'p': 0.4130845975685863, 'r': 0.36802060468026837}} 2020-09-13 03:22:20 | INFO | train | {"epoch": 39, "train_loss": "3.431", "train_nll_loss": "1.444", "train_ppl": "2.721", "train_wps": "1149.8", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7098", "train_lr": "7.12169e-06", "train_gnorm": "2.279", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "12812"} 2020-09-13 03:22:24 | INFO | valid | {"epoch": 39, "valid_loss": "6.127", "valid_nll_loss": "4.333", "valid_ppl": "20.159", "valid_wps": "6119.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7098", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3850664930124747, 'p': 0.41624993500454843, 'r': 0.39500751417890606}, 'rouge-2': {'f': 0.16650493821705545, 'p': 0.17903338552596454, 'r': 0.1734290759497076}, 'rouge-l': {'f': 0.38333622366926445, 'p': 0.4211383929772211, 'r': 0.38013266010559055}} 2020-09-13 03:23:47 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 39 @ 7098 updates, score 6.127) (writing took 4.8811688451096416 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3791954685119963, 'p': 0.4095449415666187, 'r': 0.38998344500957005}, 'rouge-2': {'f': 0.1587988932718357, 'p': 0.1721871158793693, 'r': 0.16477044960993284}, 'rouge-l': {'f': 0.3762471094410884, 'p': 0.4124276670798772, 'r': 0.3733499950158345}} 2020-09-13 03:28:07 | INFO | train | {"epoch": 40, "train_loss": "3.382", "train_nll_loss": "1.389", "train_ppl": "2.618", "train_wps": "1116.7", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7280", "train_lr": "7.03211e-06", "train_gnorm": "2.242", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "13159"} 2020-09-13 03:28:13 | INFO | valid | {"epoch": 40, "valid_loss": "6.151", "valid_nll_loss": "4.357", "valid_ppl": "20.485", "valid_wps": "3936.6", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7280", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3890387077350804, 'p': 0.42607329008636396, 'r': 0.39286505207167427}, 'rouge-2': {'f': 0.1711647526477222, 'p': 0.18699617326343265, 'r': 0.17517819545443322}, 'rouge-l': {'f': 0.3875280498801758, 'p': 0.42916514940329664, 'r': 0.3812118893956973}} 2020-09-13 03:29:30 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 40 @ 7280 updates, score 6.151) (writing took 4.4904003804549575 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3726439693825235, 'p': 0.40527439922433883, 'r': 0.38066306967102415}, 'rouge-2': {'f': 0.1534322184436473, 'p': 0.16804306661481672, 'r': 0.15854620618257684}, 'rouge-l': {'f': 0.3707951982072947, 'p': 0.40862711990312467, 'r': 0.36647558059887014}} 2020-09-13 03:33:52 | INFO | train | {"epoch": 41, "train_loss": "3.35", "train_nll_loss": "1.351", "train_ppl": "2.551", "train_wps": "1122.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7462", "train_lr": "6.94582e-06", "train_gnorm": "2.213", "train_clip": "0", "train_oom": "0", "train_train_wall": "177", "train_wall": "13504"} 2020-09-13 03:33:58 | INFO | valid | {"epoch": 41, "valid_loss": "6.181", "valid_nll_loss": "4.391", "valid_ppl": "20.984", "valid_wps": "4025.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7462", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3843756468861572, 'p': 0.41117690198532353, 'r': 0.39729475731101416}, 'rouge-2': {'f': 0.16308450642519431, 'p': 0.17295082127690298, 'r': 0.17102846873431943}, 'rouge-l': {'f': 0.37910076864649406, 'p': 0.4103915731589071, 'r': 0.3803033642081009}} 2020-09-13 03:35:23 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 41 @ 7462 updates, score 6.181) (writing took 4.650649065151811 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3769917556036458, 'p': 0.3944497389851229, 'r': 0.40075455946786587}, 'rouge-2': {'f': 0.1521325700581021, 'p': 0.1589907345850074, 'r': 0.16469397096833938}, 'rouge-l': {'f': 0.3697896732897443, 'p': 0.39433115895849924, 'r': 0.37772640423431814}} 2020-09-13 03:39:50 | INFO | train | {"epoch": 42, "train_loss": "3.311", "train_nll_loss": "1.307", "train_ppl": "2.473", "train_wps": "1084", "train_ups": "0.51", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7644", "train_lr": "6.86264e-06", "train_gnorm": "2.195", "train_clip": "0", "train_oom": "0", "train_train_wall": "179", "train_wall": "13862"} 2020-09-13 03:39:54 | INFO | valid | {"epoch": 42, "valid_loss": "6.192", "valid_nll_loss": "4.401", "valid_ppl": "21.12", "valid_wps": "4981.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7644", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3839548844216928, 'p': 0.40919695990657146, 'r': 0.39678524748658583}, 'rouge-2': {'f': 0.16309939507844493, 'p': 0.1717916669360824, 'r': 0.17226327704521602}, 'rouge-l': {'f': 0.38036248312741655, 'p': 0.4108766577386286, 'r': 0.381103948466934}} 2020-09-13 03:41:25 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 42 @ 7644 updates, score 6.192) (writing took 4.394819853827357 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37688485075539657, 'p': 0.39846735624071966, 'r': 0.3945277424880961}, 'rouge-2': {'f': 0.1555117580701479, 'p': 0.1640307380388293, 'r': 0.16614039633603137}, 'rouge-l': {'f': 0.37278658003532866, 'p': 0.4006159776983299, 'r': 0.37631725030144797}} 2020-09-13 03:45:51 | INFO | train | {"epoch": 43, "train_loss": "3.28", "train_nll_loss": "1.27", "train_ppl": "2.412", "train_wps": "1070.6", "train_ups": "0.5", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7826", "train_lr": "6.78237e-06", "train_gnorm": "2.213", "train_clip": "0", "train_oom": "0", "train_train_wall": "177", "train_wall": "14224"} 2020-09-13 03:45:55 | INFO | valid | {"epoch": 43, "valid_loss": "6.194", "valid_nll_loss": "4.405", "valid_ppl": "21.181", "valid_wps": "5927.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7826", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38388910329361303, 'p': 0.4258477894016644, 'r': 0.3848852055702036}, 'rouge-2': {'f': 0.16835026873515999, 'p': 0.18584556947743405, 'r': 0.17154795199389153}, 'rouge-l': {'f': 0.3794284964164359, 'p': 0.42386850952855576, 'r': 0.37060369000872206}} 2020-09-13 03:47:31 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 43 @ 7826 updates, score 6.194) (writing took 4.312541832216084 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3649928682374293, 'p': 0.40100515797443215, 'r': 0.37178177746836855}, 'rouge-2': {'f': 0.14818549602209022, 'p': 0.16269601833548097, 'r': 0.15371445974590012}, 'rouge-l': {'f': 0.3619876909686621, 'p': 0.40180170267059645, 'r': 0.35681259215856076}} 2020-09-13 03:48:41 | INFO | fairseq_cli.train | early stop since valid performance hasn't improved for last 30 runs 2020-09-13 03:48:41 | INFO | fairseq_cli.train | done training in 14391.9 seconds

jiaaoc commented 3 years ago

I think I figure out the reason, if you did not download the pre-trained model in the folder, the model is not initialized with pre-trained BART, instead, they are going to be randomly initialized.

Please download the pre-trained BART here (https://github.com/pytorch/fairseq/tree/master/examples/bart)

jiaaoc commented 3 years ago

as shown in your log:

2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt

jiaaoc commented 3 years ago

that's probably the reason why your results are pretty low.

negrinho commented 3 years ago

Good catch. You are right. I'm training the single view model and the results seem to match those reported in the paper. Thanks for the help.

Test on val set: 
100% 817/817 [03:08<00:00,  4.33it/s]
Val {'rouge-1': {'f': 0.47053820487934117, 'p': 0.481068078158503, 'r': 0.5012747517270539}, 'rouge-2': {'f': 0.23280899121248622, 'p': 0.23762821988807867, 'r': 0.2502566730166665}, 'rouge-l': {'f': 0.45843104678080715, 'p': 0.4705976576858032, 'r': 0.47959589277465375}}
2020-11-06 21:23:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_stage/checkpoint_best.pt (epoch 1 @ 93 updates, score 4.057) (writing took 234.77840801099956 seconds)
Test on testing set: 
100% 818/818 [03:15<00:00,  4.19it/s]
Test {'rouge-1': {'f': 0.46512253633774703, 'p': 0.4772971625389979, 'r': 0.4974225331330478}, 'rouge-2': {'f': 0.2247942720566339, 'p': 0.23095709935043798, 'r': 0.24239651268780865}, 'rouge-l': {'f': 0.452616026351333, 'p': 0.46413084533332033, 'r': 0.47522827237678494}}
epoch 002:  73% 68/93 [14:01<05:11, 12.46s/it, loss=4.098, nll_loss=2.26, ppl=4.791, wps=191.4, ups=0.05, wpb=4184.4, bsz=160.2, num_updates=161, lr=2.415e-05, gnorm=2.919, clip=100, oom=0, train_wall=833, wall=2710]
chostyouwang commented 3 years ago

as shown in your log:

2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt

I have the same problom,but I hava downloaded 'model.pt'file in /content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt why can not find