Unable to replicate results reported in the paper?

I've tried running your code from this repo but couldn't replicate the results that you report in the paper. For example, I don't achieve the best model at around 7 epochs as you say. The best model that I got performed significantly worse than your reported results. I only get to around 0.26 ROUGE1. Do you have any ideas about why this might be? Which version of Pytorch have you used? I'm using Pytorch 1.4 and the preprocessed data that you included in the repo. See below for the log for the single view model.

epoch 016 | loss 6.261 | nll_loss 4.916 | ppl 30.193 | wps 234.3 | ups 0.06 | wpb 4165.4 | bsz 158.3 | num_updates 1488 | lr 2.195e-05 | gnorm 2.165 | clip 100 | oom 0 | train_wall 1064 | wall 25366
epoch 016 | valid on 'valid' subset | loss 7.379 | nll_loss 6.115 | ppl 69.293 | wps 1017.4 | wpb 132.8 | bsz 5 | num_updates 1488 | best_loss 7.379
here bpe NONE
here!
Test on val set: 
100% 817/817 [02:35<00:00,  5.27it/s]
Val {'rouge-1': {'f': 0.26769580254553177, 'p': 0.30399684645069164, 'r': 0.26228723498609796}, 'rouge-2': {'f': 0.07173007955995553, 'p': 0.08290470011345255, 'r': 0.07046128497657979}, 'rouge-l': {'f': 0.264904383518601, 'p': 0.3149244870641518, 'r': 0.24536414711376517}}
2020-10-30 05:17:30 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_stage/checkpoint_best.pt (epoch 16 @ 1488 updates, score 7.379) (writing took 236.98618674099998 seconds)
Test on testing set: 
100% 818/818 [02:42<00:00,  5.03it/s]
Test {'rouge-1': {'f': 0.2707510254925983, 'p': 0.30304375457878013, 'r': 0.27045976455175946}, 'rouge-2': {'f': 0.07069378120884638, 'p': 0.08043789892742863, 'r': 0.07085366466696506}, 'rouge-l': {'f': 0.26921047464426007, 'p': 0.3131869557940146, 'r': 0.25498452981146014}}

Are you using BART-large? One possible reason might be that facebook has updated the bpe and there might exist some mismatch when initializing the embedding matrix and the id of our special separator token.

Also, it is abnormal that you achieve the best performance after 16 epoches. Based on my previous observations, the best model for single-view/multi-view will be achieved after 6 or 7 epoches.

I think the codes in this repo should be good as I received emails from other people saying that they could replicate similar results.

I haven't changed anything. Just cloned the repo and used colab to run your experiments. See the link for the colab file if you want to take a look (https://colab.research.google.com/drive/1tzmWGhSlnXBuBkYE2Llvzl0cS7k1KW-m?usp=sharing). You just have to upload your compressed data into your Google Drive folder and should be able to run the colab file right away. I cleaned the colab notebook a bit now, but definitely got those results last time I ran the code using colab.

The learning rate seems very low. Can you post the output logs that result from running this code on your setup?

2020-11-06 17:46:21 | INFO | fairseq_cli.train | Namespace(T=1, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.1, balance=False, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format=None, log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=False, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.large/model.pt', save_dir='checkpoints_stage', save_interval=1, save_interval_updates=0, seed=14632, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=5000, train_subset='train', truncate_source=True, update_freq=[32], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=200, weight_decay=0.01)
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | [source] dictionary: 50264 types
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | [target] dictionary: 50264 types
2020-11-06 17:46:21 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.source
2020-11-06 17:46:21 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.target
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | cnn_dm-bin valid source-target 818 examples
2020-11-06 17:46:31 | INFO | fairseq_cli.train | BARTModel(
  (encoder): TransformerEncoder(
    (embed_tokens): Embedding(50264, 1024, padding_idx=1)
    (embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
    (layers): ModuleList(
      (0): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (1): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (2): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (3): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (4): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (5): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (6): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (7): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (8): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (9): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (10): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (11): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TransformerDecoder(
    (embed_tokens): Embedding(50264, 1024, padding_idx=1)
    (embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
    (layers): ModuleList(
      (0): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (1): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (2): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (3): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (4): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (5): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (6): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (7): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (8): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (9): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (10): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (11): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (classification_heads): ModuleDict()
  (section_positions): LearnedPositionalEmbedding(1025, 1024, padding_idx=0)
  (section_layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (section): LSTM(1024, 1024)
  (w_proj_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (w_proj): Linear(in_features=1024, out_features=1024, bias=True)
  (w_context_vector): Linear(in_features=1024, out_features=1, bias=False)
  (softmax): Softmax(dim=1)
)
2020-11-06 17:46:31 | INFO | fairseq_cli.train | model bart_large, criterion LabelSmoothedCrossEntropyCriterion
2020-11-06 17:46:31 | INFO | fairseq_cli.train | num. model params: 416791552 (num. trained: 416791552)
2020-11-06 17:46:38 | INFO | fairseq_cli.train | training on 1 GPUs
2020-11-06 17:46:38 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None
2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt
2020-11-06 17:46:38 | INFO | fairseq.trainer | loading train data for epoch 0
2020-11-06 17:46:38 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.source
2020-11-06 17:46:38 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.target
2020-11-06 17:46:38 | INFO | fairseq.tasks.translation | cnn_dm-bin train source-target 14731 examples
2020-11-06 17:46:38 | WARNING | fairseq.data.data_utils | 5 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269]
group1: 
511
group2: 
12
2020-11-06 17:46:38 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
here schedule!
False
epoch 001:  40% 37/93 [06:54<10:44, 11.52s/it, loss=14.612, nll_loss=14.464, ppl=22602, wps=377.8, ups=0.09, wpb=4223.9, bsz=156.7, num_updates=37, lr=5.55e-06, gnorm=4.996, clip=100, oom=0, train_wall=410, wall=415]

Hi this is one example log when we are training multi_view BART_base:

2020-10-16 20:22:37 | INFO | fairseq_cli.train | Namespace(T=0.2, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_base', attention_dropout=0.1, balance=True, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin_2', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=12, decoder_embed_dim=768, decoder_embed_path=None, decoder_ffn_embed_dim=3072, decoder_input_dim=768, decoder_layerdrop=0, decoder_layers=6, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=768, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format='json', log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=500.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=True, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=5, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.base/model.pt', save_dir='checkpoints_multi_base_1', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=2000, train_subset='train', truncate_source=True, update_freq=[16], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=120, weight_decay=0.01) 2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | [source] dictionary: 51200 types 2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | [target] dictionary: 51200 types 2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin_2/valid.source-target.source 2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.source 2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin_2/valid.source-target.target 2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 valid source-target 818 examples !!! 818 818 2020-10-16 20:22:40 | INFO | fairseq_cli.train | BARTModel( (encoder): TransformerEncoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (2): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (3): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (4): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (5): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (decoder): TransformerDecoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (2): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (3): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (4): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (5): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (classification_heads): ModuleDict() (section_positions): LearnedPositionalEmbedding(1025, 1024, padding_idx=0) (section_layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (section): LSTM(768, 768) (w_proj_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (w_proj): Linear(in_features=768, out_features=768, bias=True) (w_context_vector): Linear(in_features=768, out_features=1, bias=False) (softmax): Softmax(dim=1) ) 2020-10-16 20:22:40 | INFO | fairseq_cli.train | model bart_base, criterion LabelSmoothedCrossEntropyCriterion 2020-10-16 20:22:40 | INFO | fairseq_cli.train | num. model params: 146507776 (num. trained: 146507776) 2020-10-16 20:22:43 | INFO | fairseq_cli.train | training on 1 GPUs 2020-10-16 20:22:43 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None 2020-10-16 20:22:43 | INFO | fairseq.trainer | loaded checkpoint ./bart.base/model.pt (epoch 14 @ 0 updates) group1: 259 group2: 12 2020-10-16 20:22:43 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16 here schedule! 2020-10-16 20:22:43 | INFO | fairseq.trainer | loading train data for epoch 0 2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.source 2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.source 2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.target 2020-10-16 20:22:43 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 train source-target 14731 examples !!! 14731 14731 2020-10-16 20:22:43 | WARNING | fairseq.data.data_utils | 6 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269, 8197] True 2020-10-16 20:28:05 | INFO | train | {"epoch": 1, "train_loss": "5.334", "train_nll_loss": "3.491", "train_ppl": "11.247", "train_wps": "1206.4", "train_ups": "0.59", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "189", "train_lr": "2.88989e-05", "train_gnorm": "6.384", "train_clip": "100", "train_oom": "0", "train_train_wall": "303", "train_wall": "323"} /pytorch/torch/csrc/utils/python_argparser.cpp:756: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, *, Number alpha) 2020-10-16 20:28:11 | INFO | valid | {"epoch": 1, "valid_loss": "4.494", "valid_nll_loss": "2.632", "valid_ppl": "6.201", "valid_wps": "3638.8", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "189"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.39893327494573744, 'p': 0.48739021531416354, 'r': 0.3672381425752768}, 'rouge-2': {'f': 0.19168286247403196, 'p': 0.23579704030498724, 'r': 0.1772675131514576}, 'rouge-l': {'f': 0.38773004473056544, 'p': 0.4650643030400437, 'r': 0.3571665562085555}} 2020-10-16 20:29:16 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 1 @ 189 updates, score 4.494) (writing took 2.720979069825262 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3881301009025109, 'p': 0.47039422544482545, 'r': 0.3606226446800223}, 'rouge-2': {'f': 0.17881792205695904, 'p': 0.21852663998652969, 'r': 0.16731151894505894}, 'rouge-l': {'f': 0.3800338863639725, 'p': 0.4518477819676159, 'r': 0.35300024402391994}} /pytorch/aten/src/ATen/native/BinaryOps.cpp:66: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. 2020-10-16 20:35:49 | INFO | train | {"epoch": 2, "train_loss": "4.432", "train_nll_loss": "2.625", "train_ppl": "6.168", "train_wps": "835.9", "train_ups": "0.41", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "378", "train_lr": "2.5883e-05", "train_gnorm": "2.332", "train_clip": "100", "train_oom": "0", "train_train_wall": "313", "train_wall": "786"} 2020-10-16 20:35:54 | INFO | valid | {"epoch": 2, "valid_loss": "4.322", "valid_nll_loss": "2.492", "valid_ppl": "5.627", "valid_wps": "3825.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "378", "valid_best_loss": "4.322"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4279867966292195, 'p': 0.4571306118661938, 'r': 0.44022592801096416}, 'rouge-2': {'f': 0.21371075541532447, 'p': 0.2285121015700478, 'r': 0.22154388398488878}, 'rouge-l': {'f': 0.4196784994354386, 'p': 0.4456033541138049, 'r': 0.4278439895292393}} 2020-10-16 20:37:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 2 @ 378 updates, score 4.322) (writing took 11.695231701247394 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4135366281231374, 'p': 0.444507361098039, 'r': 0.423101539955033}, 'rouge-2': {'f': 0.19432386047889444, 'p': 0.21068450242099074, 'r': 0.19930810921791703}, 'rouge-l': {'f': 0.4059372042257342, 'p': 0.43402206622848205, 'r': 0.41056085492657124}} 2020-10-16 20:44:23 | INFO | train | {"epoch": 3, "train_loss": "4.19", "train_nll_loss": "2.368", "train_ppl": "5.162", "train_wps": "753.5", "train_ups": "0.37", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "567", "train_lr": "2.2867e-05", "train_gnorm": "2.312", "train_clip": "100", "train_oom": "0", "train_train_wall": "323", "train_wall": "1300"} 2020-10-16 20:44:28 | INFO | valid | {"epoch": 3, "valid_loss": "4.242", "valid_nll_loss": "2.397", "valid_ppl": "5.266", "valid_wps": "3877.8", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "567", "valid_best_loss": "4.242"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4315874956151277, 'p': 0.48791636226347873, 'r': 0.41966804049154866}, 'rouge-2': {'f': 0.2188827333313949, 'p': 0.2480313270750059, 'r': 0.21386282199141377}, 'rouge-l': {'f': 0.41806919758048416, 'p': 0.4660477028142457, 'r': 0.40645590435600293}} 2020-10-16 20:45:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 3 @ 567 updates, score 4.242) (writing took 8.149096994195133 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4138701252407696, 'p': 0.4678226724582506, 'r': 0.4065260752587133}, 'rouge-2': {'f': 0.19377017502951563, 'p': 0.22115075342729995, 'r': 0.1904202394946015}, 'rouge-l': {'f': 0.4010310496097956, 'p': 0.44619094434249496, 'r': 0.3929682749938597}} 2020-10-16 20:52:16 | INFO | train | {"epoch": 4, "train_loss": "4.009", "train_nll_loss": "2.171", "train_ppl": "4.505", "train_wps": "818.8", "train_ups": "0.4", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "756", "train_lr": "1.98511e-05", "train_gnorm": "2.142", "train_clip": "100", "train_oom": "0", "train_train_wall": "298", "train_wall": "1773"} 2020-10-16 20:52:21 | INFO | valid | {"epoch": 4, "valid_loss": "4.192", "valid_nll_loss": "2.352", "valid_ppl": "5.104", "valid_wps": "3870.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "756", "valid_best_loss": "4.192"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.43669851905346935, 'p': 0.47079025587829726, 'r': 0.4442491635654981}, 'rouge-2': {'f': 0.21691059357860815, 'p': 0.23458404727614512, 'r': 0.220700188219085}, 'rouge-l': {'f': 0.424007418421588, 'p': 0.4522198002073135, 'r': 0.4289625411274865}} 2020-10-16 20:53:51 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 4 @ 756 updates, score 4.192) (writing took 9.192486869171262 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4225243192276242, 'p': 0.45223754352590134, 'r': 0.43401393897009366}, 'rouge-2': {'f': 0.19391878994057452, 'p': 0.20822517840877766, 'r': 0.1994167266777721}, 'rouge-l': {'f': 0.4080676812912881, 'p': 0.43246768831954485, 'r': 0.4165620181146185}} 2020-10-16 21:00:31 | INFO | train | {"epoch": 5, "train_loss": "3.887", "train_nll_loss": "2.039", "train_ppl": "4.11", "train_wps": "781.9", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "945", "train_lr": "1.68351e-05", "train_gnorm": "2.074", "train_clip": "100", "train_oom": "0", "train_train_wall": "303", "train_wall": "2269"} 2020-10-16 21:00:37 | INFO | valid | {"epoch": 5, "valid_loss": "4.186", "valid_nll_loss": "2.342", "valid_ppl": "5.071", "valid_wps": "3877.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "945", "valid_best_loss": "4.186"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4461844243274079, 'p': 0.45666747672161484, 'r': 0.4767132459495706}, 'rouge-2': {'f': 0.22200930553762793, 'p': 0.22706208545278755, 'r': 0.23842818519947412}, 'rouge-l': {'f': 0.43306061447923366, 'p': 0.44238726551550944, 'r': 0.4563430992482793}} 2020-10-16 21:02:12 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 5 @ 945 updates, score 4.186) (writing took 11.725180207751691 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4333094974371008, 'p': 0.4428674781714452, 'r': 0.46658344288666276}, 'rouge-2': {'f': 0.20111995145025907, 'p': 0.20625643433873106, 'r': 0.217710295965629}, 'rouge-l': {'f': 0.42099773909932536, 'p': 0.42971386878603307, 'r': 0.44643271326223904}} 2020-10-16 21:08:55 | INFO | train | {"epoch": 6, "train_loss": "3.787", "train_nll_loss": "1.93", "train_ppl": "3.81", "train_wps": "769.1", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1134", "train_lr": "1.38191e-05", "train_gnorm": "2.034", "train_clip": "100", "train_oom": "0", "train_train_wall": "302", "train_wall": "2772"} 2020-10-16 21:09:00 | INFO | valid | {"epoch": 6, "valid_loss": "4.18", "valid_nll_loss": "2.343", "valid_ppl": "5.075", "valid_wps": "3875.3", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1134", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.44834014142761747, 'p': 0.47490514105110054, 'r': 0.46418072489161993}, 'rouge-2': {'f': 0.22448318205207965, 'p': 0.23786780200771554, 'r': 0.23428752100684014}, 'rouge-l': {'f': 0.431713338823393, 'p': 0.4549432489377001, 'r': 0.44269410749902055}} 2020-10-16 21:10:27 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 6 @ 1134 updates, score 4.18) (writing took 7.535578944720328 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4353075633115118, 'p': 0.45960422226275544, 'r': 0.4524936710724383}, 'rouge-2': {'f': 0.20453167333689543, 'p': 0.21761783799140255, 'r': 0.21275645602953855}, 'rouge-l': {'f': 0.419203755880583, 'p': 0.43906384085100353, 'r': 0.43266115281125556}} 2020-10-16 21:17:11 | INFO | train | {"epoch": 7, "train_loss": "3.715", "train_nll_loss": "1.85", "train_ppl": "3.604", "train_wps": "781.3", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1323", "train_lr": "1.08032e-05", "train_gnorm": "2.042", "train_clip": "100", "train_oom": "0", "train_train_wall": "306", "train_wall": "3268"} 2020-10-16 21:17:16 | INFO | valid | {"epoch": 7, "valid_loss": "4.181", "valid_nll_loss": "2.345", "valid_ppl": "5.081", "valid_wps": "3853.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1323", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4481947505100136, 'p': 0.46780376177094585, 'r': 0.4699430608678166}, 'rouge-2': {'f': 0.22568330262401542, 'p': 0.23667032669984672, 'r': 0.2375391979501824}, 'rouge-l': {'f': 0.43290810524697976, 'p': 0.4484189183310228, 'r': 0.45029655273945113}} 2020-10-16 21:18:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 7 @ 1323 updates, score 4.181) (writing took 3.958364794962108 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42504939222262134, 'p': 0.4437278759292623, 'r': 0.44713921113126553}, 'rouge-2': {'f': 0.19796039505355403, 'p': 0.2078876050553927, 'r': 0.20811213382174953}, 'rouge-l': {'f': 0.41333885103722146, 'p': 0.42888192585788004, 'r': 0.4307527462400245}} 2020-10-16 21:25:20 | INFO | train | {"epoch": 8, "train_loss": "3.655", "train_nll_loss": "1.783", "train_ppl": "3.442", "train_wps": "791.5", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1512", "train_lr": "7.78723e-06", "train_gnorm": "2.018", "train_clip": "100", "train_oom": "0", "train_train_wall": "296", "train_wall": "3757"} 2020-10-16 21:25:25 | INFO | valid | {"epoch": 8, "valid_loss": "4.188", "valid_nll_loss": "2.358", "valid_ppl": "5.126", "valid_wps": "3883.7", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1512", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4497831590533799, 'p': 0.4743879319512072, 'r': 0.46696211017029543}, 'rouge-2': {'f': 0.22560126332234826, 'p': 0.23866609130364316, 'r': 0.23526072930189967}, 'rouge-l': {'f': 0.4331148209032409, 'p': 0.4523118415115414, 'r': 0.4466045581005789}} 2020-10-16 21:26:54 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 8 @ 1512 updates, score 4.188) (writing took 6.4392871032468975 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42990064221283536, 'p': 0.4527699793144171, 'r': 0.44901333807162896}, 'rouge-2': {'f': 0.20288890059810166, 'p': 0.21534024009359243, 'r': 0.21117838919893522}, 'rouge-l': {'f': 0.41598493880199483, 'p': 0.4349848431222833, 'r': 0.4314877150138075}} 2020-10-16 21:33:30 | INFO | train | {"epoch": 9, "train_loss": "3.608", "train_nll_loss": "1.731", "train_ppl": "3.32", "train_wps": "789.6", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1701", "train_lr": "4.77128e-06", "train_gnorm": "1.996", "train_clip": "100", "train_oom": "0", "train_train_wall": "298", "train_wall": "4248"} 2020-10-16 21:33:36 | INFO | valid | {"epoch": 9, "valid_loss": "4.19", "valid_nll_loss": "2.359", "valid_ppl": "5.129", "valid_wps": "3839.3", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1701", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4469569361973397, 'p': 0.4798210784938584, 'r': 0.45528237283388123}, 'rouge-2': {'f': 0.2263709493623402, 'p': 0.2433974733721229, 'r': 0.23132458588703095}, 'rouge-l': {'f': 0.4316170975348626, 'p': 0.45903993291697, 'r': 0.43719317336834507}} 2020-10-16 21:34:59 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 9 @ 1701 updates, score 4.19) (writing took 3.861534607131034 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42781119671981177, 'p': 0.46124989848989056, 'r': 0.436261581761792}, 'rouge-2': {'f': 0.20098620993180524, 'p': 0.21921218562638198, 'r': 0.20423255976966606}, 'rouge-l': {'f': 0.41332683231640294, 'p': 0.44168964220540274, 'r': 0.4191730895606922}} 2020-10-16 21:41:41 | INFO | train | {"epoch": 10, "train_loss": "3.575", "train_nll_loss": "1.693", "train_ppl": "3.233", "train_wps": "789.3", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1890", "train_lr": "1.75532e-06", "train_gnorm": "1.981", "train_clip": "100", "train_oom": "0", "train_train_wall": "307", "train_wall": "4739"} 2020-10-16 21:41:47 | INFO | valid | {"epoch": 10, "valid_loss": "4.203", "valid_nll_loss": "2.369", "valid_ppl": "5.167", "valid_wps": "3792", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1890", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.4489107167227575, 'p': 0.472013059581364, 'r': 0.46714979898744746}, 'rouge-2': {'f': 0.22538691538484998, 'p': 0.23704805111264826, 'r': 0.23613808765906907}, 'rouge-l': {'f': 0.4316388762633301, 'p': 0.45038233458541693, 'r': 0.44596016494663443}} 2020-10-16 21:43:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 10 @ 1890 updates, score 4.203) (writing took 3.90104684792459 seconds) Test on testing set: Test {'rouge-1': {'f': 0.4306987172598409, 'p': 0.4531877008856705, 'r': 0.4487099330991109}, 'rouge-2': {'f': 0.2012845838629506, 'p': 0.21321799010752718, 'r': 0.20959143334202265}, 'rouge-l': {'f': 0.41571496654100026, 'p': 0.43409465797013363, 'r': 0.4300684739298519}} 2020-10-16 21:50:12 | INFO | train | {"epoch": 11, "train_loss": "3.556", "train_nll_loss": "1.671", "train_ppl": "3.185", "train_wps": "758.1", "train_ups": "0.37", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "2079", "train_lr": "0", "train_gnorm": "1.957", "train_clip": "100", "train_oom": "0", "train_train_wall": "316", "train_wall": "5249"} 2020-10-16 21:50:18 | INFO | valid | {"epoch": 11, "valid_loss": "4.203", "valid_nll_loss": "2.371", "valid_ppl": "5.172", "valid_wps": "3874.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "2079", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.446642502255209, 'p': 0.4723302484715944, 'r': 0.46238440805767694}, 'rouge-2': {'f': 0.22346453229760987, 'p': 0.2373075042245457, 'r': 0.23220979841378647}, 'rouge-l': {'f': 0.4295766026698361, 'p': 0.4502630944373418, 'r': 0.4416669748596527}} 2020-10-16 21:51:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 11 @ 2079 updates, score 4.203) (writing took 4.998790017794818 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42976349488980414, 'p': 0.4532897733258767, 'r': 0.44788645672433053}, 'rouge-2': {'f': 0.199710011641502, 'p': 0.21187374900537204, 'r': 0.20828099794102461}, 'rouge-l': {'f': 0.4141090520180718, 'p': 0.43257857065512195, 'r': 0.42896840192756164}} 2020-10-16 21:58:20 | INFO | train | {"epoch": 12, "train_loss": "3.55", "train_nll_loss": "1.665", "train_ppl": "3.172", "train_wps": "793.6", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "2268", "train_lr": "0", "train_gnorm": "1.939", "train_clip": "100", "train_oom": "0", "train_train_wall": "300", "train_wall": "5738"} 2020-10-16 21:58:26 | INFO | valid | {"epoch": 12, "valid_loss": "4.203", "valid_nll_loss": "2.371", "valid_ppl": "5.172", "valid_wps": "3878.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "2268", "valid_best_loss": "4.18"} here bpe NONE here! Test on val set: Val {'rouge-1': {'f': 0.446642502255209, 'p': 0.4723302484715944, 'r': 0.46238440805767694}, 'rouge-2': {'f': 0.22346453229760987, 'p': 0.2373075042245457, 'r': 0.23220979841378647}, 'rouge-l': {'f': 0.4295766026698361, 'p': 0.4502630944373418, 'r': 0.4416669748596527}} 2020-10-16 21:59:52 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 12 @ 2268 updates, score 4.203) (writing took 4.920202174689621 seconds) Test on testing set: Test {'rouge-1': {'f': 0.42976349488980414, 'p': 0.4532897733258767, 'r': 0.44788645672433053}, 'rouge-2': {'f': 0.199710011641502, 'p': 0.21187374900537204, 'r': 0.20828099794102461}, 'rouge-l': {'f': 0.4141090520180718, 'p': 0.43257857065512195, 'r': 0.42896840192756164}} 2020-10-16 22:01:12 | INFO | fairseq_cli.train | early stop since valid performance hasn't improved for last 5 runs 2020-10-16 22:01:12 | INFO | fairseq_cli.train | done training in 5908.6 seconds

Since I am not able to get access to the P100 machines, I am testing the train_single_view.sh with a max_length = 500. And I will post the training log here later.

Thanks for looking. I think that you may be right about it being a discrepancy between the tokenizations somehow. I get much lower results. The preprocessed files may no longer be up to date for the versions that colab is pulling. If you get a change to run the colab file, that would be great. I will preprocess the data again and see if I get different results.

Yes, I have tried training from scratch without BART_initialization as well, and the results were better than what you have observed.

This is the log for BART_base encoder + random initialized decoder for single-view training:

2020-09-12 23:48:39 | INFO | fairseq_cli.train | Namespace(T=1, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_encoder_base', attention_dropout=0.2, balance=False, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=5.0, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='data_none', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=4, decoder_embed_dim=768, decoder_embed_path=None, decoder_ffn_embed_dim=3072, decoder_input_dim=768, decoder_layerdrop=0, decoder_layers=2, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=768, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.2, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format='json', log_interval=1000, lr=[3e-05], lr_scheduler='inverse_sqrt', lr_weight=100.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=False, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=30, pooler_activation_fn='tanh', pooler_dropout=0.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.base/model.pt', save_dir='checkpoints_scratch_1', save_interval=1, save_interval_updates=0, seed=0, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', temp_file='bart_base_scratch', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, train_subset='train', truncate_source=True, update_freq=[16], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, view_2_path='None', warmup_init_lr=-1, warmup_updates=400, weight_decay=0.1) 2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | [source] dictionary: 51200 types 2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | [target] dictionary: 51200 types 2020-09-12 23:48:39 | INFO | fairseq.data.data_utils | loaded 818 examples from: data_none/valid.source-target.source 2020-09-12 23:48:39 | INFO | fairseq.data.data_utils | loaded 818 examples from: data_none/valid.source-target.target 2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | data_none valid source-target 818 examples 2020-09-12 23:48:44 | INFO | fairseq_cli.train | BARTModel( (encoder): TransformerEncoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (2): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (3): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (4): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (5): TransformerEncoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (decoder): TransformerDecoder( (embed_tokens): Embedding(51200, 768, padding_idx=1) (embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1) (layers): ModuleList( (0): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): TransformerDecoderLayer( (self_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): MultiheadAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (classification_heads): ModuleDict() (section): LSTM(768, 768) (w_proj_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (w_proj): Linear(in_features=768, out_features=768, bias=True) (w_context_vector): Linear(in_features=768, out_features=1, bias=False) (softmax): Softmax(dim=1) ) 2020-09-12 23:48:44 | INFO | fairseq_cli.train | model bart_encoder_base, criterion LabelSmoothedCrossEntropyCriterion 2020-09-12 23:48:44 | INFO | fairseq_cli.train | num. model params: 107649024 (num. trained: 107649024) 2020-09-12 23:48:48 | INFO | fairseq_cli.train | training on 1 GPUs 2020-09-12 23:48:48 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None bart_encoder_base 2020-09-12 23:48:48 | INFO | fairseq.trainer | loaded checkpoint ./bart.base/model.pt (epoch 14 @ 0 updates) group1: 103 group2: 61 2020-09-12 23:48:48 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16 here schedule! 2020-09-12 23:48:48 | INFO | fairseq.trainer | loading train data for epoch 0 2020-09-12 23:48:48 | INFO | fairseq.data.data_utils | loaded 14731 examples from: data_none/train.source-target.source 2020-09-12 23:48:48 | INFO | fairseq.data.data_utils | loaded 14731 examples from: data_none/train.source-target.target 2020-09-12 23:48:48 | INFO | fairseq.tasks.translation | data_none train source-target 14731 examples 2020-09-12 23:48:49 | WARNING | fairseq.data.data_utils | 5 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269] False 2020-09-12 23:51:50 | INFO | train | {"epoch": 1, "train_loss": "10.421", "train_nll_loss": "9.332", "train_ppl": "644.363", "train_wps": "2140", "train_ups": "1.01", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "182", "train_lr": "1.365e-05", "train_gnorm": "2.41", "train_clip": "8.2", "train_oom": "0", "train_train_wall": "170", "train_wall": "182"} /pytorch/torch/csrc/utils/python_argparser.cpp:756: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, *, Number alpha) 2020-09-12 23:51:54 | INFO | valid | {"epoch": 1, "valid_loss": "7.602", "valid_nll_loss": "6.152", "valid_ppl": "71.102", "valid_wps": "4947.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "182"} here bpe NONE here! Val {'rouge-1': {'f': 0.21663793375370063, 'p': 0.22864870158974218, 'r': 0.2287239463029277}, 'rouge-2': {'f': 0.054124139134273476, 'p': 0.05736806219841362, 'r': 0.057741245203472076}, 'rouge-l': {'f': 0.22703430161156996, 'p': 0.2589247848795017, 'r': 0.21878059623920781}} 2020-09-12 23:53:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 1 @ 182 updates, score 7.602) (writing took 3.440072625875473 seconds) Test on testing set: Test {'rouge-1': {'f': 0.2143088315976068, 'p': 0.22446929769837956, 'r': 0.2289120110339817}, 'rouge-2': {'f': 0.05192138274699011, 'p': 0.05429316369788991, 'r': 0.05615224649979625}, 'rouge-l': {'f': 0.2247126841007863, 'p': 0.25170078332283147, 'r': 0.22005435135775175}} /pytorch/aten/src/ATen/native/BinaryOps.cpp:66: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. 2020-09-12 23:57:52 | INFO | train | {"epoch": 2, "train_loss": "7.32", "train_nll_loss": "5.913", "train_ppl": "60.24", "train_wps": "1069.3", "train_ups": "0.5", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "364", "train_lr": "2.73e-05", "train_gnorm": "3.339", "train_clip": "16.5", "train_oom": "0", "train_train_wall": "170", "train_wall": "545"} 2020-09-12 23:57:57 | INFO | valid | {"epoch": 2, "valid_loss": "7.298", "valid_nll_loss": "5.785", "valid_ppl": "55.125", "valid_wps": "4876", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "364", "valid_best_loss": "7.298"} here bpe NONE here! Val {'rouge-1': {'f': 0.2902339258903693, 'p': 0.35493902025894375, 'r': 0.27679295120544645}, 'rouge-2': {'f': 0.09023858321606967, 'p': 0.1113931624274892, 'r': 0.08757866133645495}, 'rouge-l': {'f': 0.2997845837858199, 'p': 0.401101296990871, 'r': 0.26031260412345514}} 2020-09-12 23:59:21 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 2 @ 364 updates, score 7.298) (writing took 7.891215533949435 seconds) Test on testing set: Test {'rouge-1': {'f': 0.28494613701125965, 'p': 0.3516514035592438, 'r': 0.2699670810086879}, 'rouge-2': {'f': 0.08420177692788793, 'p': 0.10510303003646479, 'r': 0.08140088225186674}, 'rouge-l': {'f': 0.292386051869777, 'p': 0.39432852537102037, 'r': 0.2527454550206724}} 2020-09-13 00:03:40 | INFO | train | {"epoch": 3, "train_loss": "7.642", "train_nll_loss": "6.298", "train_ppl": "78.696", "train_wps": "1114.8", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "546", "train_lr": "2.56776e-05", "train_gnorm": "9.301", "train_clip": "99.5", "train_oom": "0", "train_train_wall": "174", "train_wall": "892"} 2020-09-13 00:03:45 | INFO | valid | {"epoch": 3, "valid_loss": "6.83", "valid_nll_loss": "5.266", "valid_ppl": "38.479", "valid_wps": "4512.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "546", "valid_best_loss": "6.83"} here bpe NONE here! Val {'rouge-1': {'f': 0.28004401636313203, 'p': 0.45120614382755203, 'r': 0.22356730714002326}, 'rouge-2': {'f': 0.1042067031811556, 'p': 0.17100306931561168, 'r': 0.0837493874679366}, 'rouge-l': {'f': 0.2834823452672735, 'p': 0.46510518092133646, 'r': 0.22059007016249965}} 2020-09-13 00:04:39 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 3 @ 546 updates, score 6.83) (writing took 8.794937412254512 seconds) Test on testing set: Test {'rouge-1': {'f': 0.26935009742145355, 'p': 0.43147623041129457, 'r': 0.21399299095643845}, 'rouge-2': {'f': 0.0928812558099079, 'p': 0.15352525158121336, 'r': 0.07361179442736973}, 'rouge-l': {'f': 0.2711337233458924, 'p': 0.44552105877903414, 'r': 0.20951603467624907}} 2020-09-13 00:08:36 | INFO | train | {"epoch": 4, "train_loss": "6.986", "train_nll_loss": "5.561", "train_ppl": "47.22", "train_wps": "1310", "train_ups": "0.62", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "728", "train_lr": "2.22375e-05", "train_gnorm": "7.492", "train_clip": "90.1", "train_oom": "0", "train_train_wall": "179", "train_wall": "1188"} 2020-09-13 00:08:41 | INFO | valid | {"epoch": 4, "valid_loss": "6.512", "valid_nll_loss": "4.873", "valid_ppl": "29.308", "valid_wps": "4090.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "728", "valid_best_loss": "6.512"} here bpe NONE here! Val {'rouge-1': {'f': 0.3058323314303917, 'p': 0.45101872632920387, 'r': 0.255721901517651}, 'rouge-2': {'f': 0.12803820225900034, 'p': 0.19195720632392738, 'r': 0.10762098110671683}, 'rouge-l': {'f': 0.3125195961416917, 'p': 0.4667913347552745, 'r': 0.25371581048821884}} 2020-09-13 00:09:39 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 4 @ 728 updates, score 6.512) (writing took 8.845161844976246 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3067963357485839, 'p': 0.4412779946655861, 'r': 0.2587007664111758}, 'rouge-2': {'f': 0.12092130534646459, 'p': 0.17638283156638504, 'r': 0.10283473953818749}, 'rouge-l': {'f': 0.31175281522793147, 'p': 0.4556750348102311, 'r': 0.2543185560104487}} 2020-09-13 00:13:33 | INFO | train | {"epoch": 5, "train_loss": "6.593", "train_nll_loss": "5.121", "train_ppl": "34.798", "train_wps": "1301.7", "train_ups": "0.61", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "910", "train_lr": "1.98898e-05", "train_gnorm": "6.119", "train_clip": "61", "train_oom": "0", "train_train_wall": "173", "train_wall": "1485"} 2020-09-13 00:13:38 | INFO | valid | {"epoch": 5, "valid_loss": "6.283", "valid_nll_loss": "4.649", "valid_ppl": "25.088", "valid_wps": "4915", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "910", "valid_best_loss": "6.283"} here bpe NONE here! Val {'rouge-1': {'f': 0.32631664800109805, 'p': 0.4096853241359607, 'r': 0.30773293666069307}, 'rouge-2': {'f': 0.139532263771638, 'p': 0.17597922407733474, 'r': 0.13409987180914382}, 'rouge-l': {'f': 0.3375112445140239, 'p': 0.44056842796025036, 'r': 0.29934067025889066}} 2020-09-13 00:14:56 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 5 @ 910 updates, score 6.283) (writing took 8.922468357719481 seconds) Test on testing set: Test {'rouge-1': {'f': 0.32195108164976866, 'p': 0.3986055423428635, 'r': 0.3072986476227612}, 'rouge-2': {'f': 0.13255688382054837, 'p': 0.16593460499965418, 'r': 0.12889569667431885}, 'rouge-l': {'f': 0.3336251633595829, 'p': 0.4308272105408829, 'r': 0.2980551182792911}} 2020-09-13 00:19:12 | INFO | train | {"epoch": 6, "train_loss": "6.299", "train_nll_loss": "4.79", "train_ppl": "27.674", "train_wps": "1144.9", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1092", "train_lr": "1.81568e-05", "train_gnorm": "4.966", "train_clip": "25.8", "train_oom": "0", "train_train_wall": "173", "train_wall": "1824"} 2020-09-13 00:19:16 | INFO | valid | {"epoch": 6, "valid_loss": "6.073", "valid_nll_loss": "4.4", "valid_ppl": "21.112", "valid_wps": "5230.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1092", "valid_best_loss": "6.073"} here bpe NONE here! Val {'rouge-1': {'f': 0.3334118320946748, 'p': 0.4410821539215469, 'r': 0.29588426535787143}, 'rouge-2': {'f': 0.13739283912842795, 'p': 0.18316512984611386, 'r': 0.12292501717355879}, 'rouge-l': {'f': 0.3367563539841945, 'p': 0.45088752831270734, 'r': 0.2903563191983348}} 2020-09-13 00:20:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 6 @ 1092 updates, score 6.073) (writing took 8.304749015718699 seconds) Test on testing set: Test {'rouge-1': {'f': 0.32018946591540753, 'p': 0.43157002485898993, 'r': 0.2830485811595604}, 'rouge-2': {'f': 0.127293851871242, 'p': 0.17473306979593142, 'r': 0.11274861530450966}, 'rouge-l': {'f': 0.3211206923058116, 'p': 0.43612260347418846, 'r': 0.2759708589901207}} 2020-09-13 00:24:22 | INFO | train | {"epoch": 7, "train_loss": "6.062", "train_nll_loss": "4.522", "train_ppl": "22.981", "train_wps": "1246.6", "train_ups": "0.59", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1274", "train_lr": "1.681e-05", "train_gnorm": "3.981", "train_clip": "13.2", "train_oom": "0", "train_train_wall": "171", "train_wall": "2134"} 2020-09-13 00:24:27 | INFO | valid | {"epoch": 7, "valid_loss": "5.974", "valid_nll_loss": "4.283", "valid_ppl": "19.47", "valid_wps": "4940.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1274", "valid_best_loss": "5.974"} here bpe NONE here! Val {'rouge-1': {'f': 0.34953445294994195, 'p': 0.45927096966662395, 'r': 0.3099192101544123}, 'rouge-2': {'f': 0.15042073792671246, 'p': 0.1998221241410976, 'r': 0.1338806674885114}, 'rouge-l': {'f': 0.3493160643305676, 'p': 0.4626922715315685, 'r': 0.3021469814220278}} 2020-09-13 00:25:40 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 7 @ 1274 updates, score 5.974) (writing took 9.386484532617033 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3454603555704317, 'p': 0.45500522729028614, 'r': 0.307872752358408}, 'rouge-2': {'f': 0.14921191873819553, 'p': 0.20035740171083352, 'r': 0.13298830103947873}, 'rouge-l': {'f': 0.35050286922913615, 'p': 0.46836836790711445, 'r': 0.30264623667919816}} 2020-09-13 00:29:36 | INFO | train | {"epoch": 8, "train_loss": "5.833", "train_nll_loss": "4.263", "train_ppl": "19.199", "train_wps": "1234.4", "train_ups": "0.58", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1456", "train_lr": "1.57243e-05", "train_gnorm": "3.139", "train_clip": "1.6", "train_oom": "0", "train_train_wall": "170", "train_wall": "2448"} 2020-09-13 00:29:40 | INFO | valid | {"epoch": 8, "valid_loss": "5.84", "valid_nll_loss": "4.149", "valid_ppl": "17.745", "valid_wps": "5219.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1456", "valid_best_loss": "5.84"} here bpe NONE here! Val {'rouge-1': {'f': 0.37134382327991744, 'p': 0.4483338326050663, 'r': 0.3499853577404138}, 'rouge-2': {'f': 0.16603384362767615, 'p': 0.19866856937856062, 'r': 0.15915049114742058}, 'rouge-l': {'f': 0.37879774030543395, 'p': 0.4653144943165627, 'r': 0.34473968051528203}} 2020-09-13 00:30:53 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 8 @ 1456 updates, score 5.84) (writing took 9.01857946626842 seconds) Test on testing set: Test {'rouge-1': {'f': 0.36901160113448783, 'p': 0.44108292289748047, 'r': 0.3526763811870377}, 'rouge-2': {'f': 0.16666756440683705, 'p': 0.19984167677488363, 'r': 0.1618494645628281}, 'rouge-l': {'f': 0.3778187479842473, 'p': 0.4602486552120849, 'r': 0.3473739981188793}} 2020-09-13 00:35:03 | INFO | train | {"epoch": 9, "train_loss": "5.66", "train_nll_loss": "4.064", "train_ppl": "16.729", "train_wps": "1184", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1638", "train_lr": "1.4825e-05", "train_gnorm": "2.92", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "174", "train_wall": "2776"} 2020-09-13 00:35:08 | INFO | valid | {"epoch": 9, "valid_loss": "5.799", "valid_nll_loss": "4.105", "valid_ppl": "17.213", "valid_wps": "5105", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1638", "valid_best_loss": "5.799"} here bpe NONE here! Val {'rouge-1': {'f': 0.36842622898234595, 'p': 0.44961516033632526, 'r': 0.34284042718663166}, 'rouge-2': {'f': 0.16566291407189612, 'p': 0.20170401529199547, 'r': 0.15595426964593812}, 'rouge-l': {'f': 0.37329684998556206, 'p': 0.46107006212992346, 'r': 0.3381812931818861}} 2020-09-13 00:36:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 9 @ 1638 updates, score 5.799) (writing took 8.725783603265882 seconds) Test on testing set: Test {'rouge-1': {'f': 0.36854854367280065, 'p': 0.446462179037044, 'r': 0.3453575409922443}, 'rouge-2': {'f': 0.15766752097723266, 'p': 0.19205310732297123, 'r': 0.14911261178538193}, 'rouge-l': {'f': 0.3723420714636821, 'p': 0.45483385435551693, 'r': 0.33894738585564493}} 2020-09-13 00:40:36 | INFO | train | {"epoch": 10, "train_loss": "5.499", "train_nll_loss": "3.88", "train_ppl": "14.719", "train_wps": "1164.2", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1820", "train_lr": "1.40642e-05", "train_gnorm": "2.644", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "171", "train_wall": "3108"} 2020-09-13 00:40:41 | INFO | valid | {"epoch": 10, "valid_loss": "5.764", "valid_nll_loss": "4.061", "valid_ppl": "16.685", "valid_wps": "4140", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1820", "valid_best_loss": "5.764"} here bpe NONE here! Val {'rouge-1': {'f': 0.3732009581077997, 'p': 0.45658435160220856, 'r': 0.34843650153963096}, 'rouge-2': {'f': 0.16955068859082412, 'p': 0.20811159765922504, 'r': 0.16092543242290838}, 'rouge-l': {'f': 0.37735274143117503, 'p': 0.46684437321975286, 'r': 0.34234094727570397}} 2020-09-13 00:41:52 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 10 @ 1820 updates, score 5.764) (writing took 9.631018000654876 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3716137675352023, 'p': 0.45290106682582165, 'r': 0.350055845189764}, 'rouge-2': {'f': 0.16761863723149356, 'p': 0.2054667073198634, 'r': 0.159476663711512}, 'rouge-l': {'f': 0.3766825457695764, 'p': 0.46341365477026236, 'r': 0.3429991475782948}} 2020-09-13 00:45:59 | INFO | train | {"epoch": 11, "train_loss": "5.389", "train_nll_loss": "3.752", "train_ppl": "13.477", "train_wps": "1200", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2002", "train_lr": "1.34097e-05", "train_gnorm": "2.583", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "3431"} 2020-09-13 00:46:02 | INFO | valid | {"epoch": 11, "valid_loss": "5.732", "valid_nll_loss": "4.006", "valid_ppl": "16.062", "valid_wps": "5937", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2002", "valid_best_loss": "5.732"} here bpe NONE here! Val {'rouge-1': {'f': 0.3712171758030596, 'p': 0.46406767790531694, 'r': 0.3399832503572037}, 'rouge-2': {'f': 0.17340701068522252, 'p': 0.218702325075554, 'r': 0.15972205472044512}, 'rouge-l': {'f': 0.37458693081114397, 'p': 0.47097623661393523, 'r': 0.3349146889430698}} 2020-09-13 00:47:13 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 11 @ 2002 updates, score 5.732) (writing took 8.599667175672948 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3759528658322778, 'p': 0.4600031515568631, 'r': 0.34994197804675164}, 'rouge-2': {'f': 0.16784789754607138, 'p': 0.2078567819662484, 'r': 0.15733512603695773}, 'rouge-l': {'f': 0.37885456493676783, 'p': 0.4665253560189782, 'r': 0.34335369752238043}} 2020-09-13 00:51:19 | INFO | train | {"epoch": 12, "train_loss": "5.266", "train_nll_loss": "3.61", "train_ppl": "12.21", "train_wps": "1211.6", "train_ups": "0.57", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2184", "train_lr": "1.28388e-05", "train_gnorm": "2.559", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "3751"} 2020-09-13 00:51:22 | INFO | valid | {"epoch": 12, "valid_loss": "5.695", "valid_nll_loss": "3.958", "valid_ppl": "15.541", "valid_wps": "5973.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2184", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.37738179617980694, 'p': 0.45113007243822967, 'r': 0.35812026539253183}, 'rouge-2': {'f': 0.174381935944973, 'p': 0.20928232150430362, 'r': 0.1667457526834203}, 'rouge-l': {'f': 0.381769707403124, 'p': 0.4607884172272312, 'r': 0.3523183895063523}} 2020-09-13 00:52:37 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 12 @ 2184 updates, score 5.695) (writing took 8.865372630767524 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37599974934852276, 'p': 0.44884508733258244, 'r': 0.3585387811436595}, 'rouge-2': {'f': 0.16874446443891233, 'p': 0.20483613250924657, 'r': 0.1618007263923004}, 'rouge-l': {'f': 0.3813476499552326, 'p': 0.4596740958090682, 'r': 0.35183457630713216}} 2020-09-13 00:56:48 | INFO | train | {"epoch": 13, "train_loss": "5.158", "train_nll_loss": "3.483", "train_ppl": "11.182", "train_wps": "1176.3", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2366", "train_lr": "1.23351e-05", "train_gnorm": "2.471", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "4080"} 2020-09-13 00:56:53 | INFO | valid | {"epoch": 13, "valid_loss": "5.712", "valid_nll_loss": "3.969", "valid_ppl": "15.661", "valid_wps": "4424.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2366", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3884982705551785, 'p': 0.4491673629778516, 'r': 0.37780412740852076}, 'rouge-2': {'f': 0.17685692236323902, 'p': 0.2042786532892362, 'r': 0.17430026356282213}, 'rouge-l': {'f': 0.3935735209373062, 'p': 0.4617984008105712, 'r': 0.37031878737194673}} 2020-09-13 00:58:10 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 13 @ 2366 updates, score 5.712) (writing took 4.134681691415608 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37886357099711665, 'p': 0.43458029252143165, 'r': 0.3729596920501715}, 'rouge-2': {'f': 0.16854719468745627, 'p': 0.19559694299078864, 'r': 0.1669496828829494}, 'rouge-l': {'f': 0.38364553730125733, 'p': 0.44831425303180794, 'r': 0.3636603790462315}} 2020-09-13 01:02:30 | INFO | train | {"epoch": 14, "train_loss": "5.054", "train_nll_loss": "3.361", "train_ppl": "10.275", "train_wps": "1131.2", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2548", "train_lr": "1.18864e-05", "train_gnorm": "2.442", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "4423"} 2020-09-13 01:02:35 | INFO | valid | {"epoch": 14, "valid_loss": "5.707", "valid_nll_loss": "3.958", "valid_ppl": "15.544", "valid_wps": "4832.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2548", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3998796789622521, 'p': 0.44386233353207316, 'r': 0.402440874022752}, 'rouge-2': {'f': 0.18989748323963274, 'p': 0.20863603806938805, 'r': 0.1955903717688269}, 'rouge-l': {'f': 0.4074896840434411, 'p': 0.4639988402234145, 'r': 0.3924353598143554}} 2020-09-13 01:03:55 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 14 @ 2548 updates, score 5.707) (writing took 4.522745947353542 seconds) Test on testing set: Test {'rouge-1': {'f': 0.389306128602423, 'p': 0.43236154677359795, 'r': 0.3928469004687963}, 'rouge-2': {'f': 0.1749124813810089, 'p': 0.19233446115494, 'r': 0.18101805932331297}, 'rouge-l': {'f': 0.3974381446141553, 'p': 0.45394463380960515, 'r': 0.3816860716155822}} 2020-09-13 01:08:15 | INFO | train | {"epoch": 15, "train_loss": "4.955", "train_nll_loss": "3.246", "train_ppl": "9.486", "train_wps": "1124.8", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2730", "train_lr": "1.14834e-05", "train_gnorm": "2.476", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "4767"} 2020-09-13 01:08:19 | INFO | valid | {"epoch": 15, "valid_loss": "5.72", "valid_nll_loss": "3.969", "valid_ppl": "15.664", "valid_wps": "5135.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2730", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.385496212805097, 'p': 0.45344757978858063, 'r': 0.37208548238317946}, 'rouge-2': {'f': 0.17987898260277432, 'p': 0.2117429818111738, 'r': 0.17606453929508845}, 'rouge-l': {'f': 0.3877792425552155, 'p': 0.45854887620658535, 'r': 0.3646117908261955}} 2020-09-13 01:09:32 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 15 @ 2730 updates, score 5.72) (writing took 4.130691207945347 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37790240747430026, 'p': 0.43971254753318373, 'r': 0.367746760947724}, 'rouge-2': {'f': 0.16425074216297936, 'p': 0.19264495690629893, 'r': 0.16212163303873436}, 'rouge-l': {'f': 0.38059486158567596, 'p': 0.4492443951232903, 'r': 0.35745930848360596}} 2020-09-13 01:13:41 | INFO | train | {"epoch": 16, "train_loss": "4.867", "train_nll_loss": "3.141", "train_ppl": "8.82", "train_wps": "1187.1", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2912", "train_lr": "1.11187e-05", "train_gnorm": "2.448", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "5093"} 2020-09-13 01:13:45 | INFO | valid | {"epoch": 16, "valid_loss": "5.701", "valid_nll_loss": "3.929", "valid_ppl": "15.235", "valid_wps": "6218.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2912", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38106680686091043, 'p': 0.4805848063501437, 'r': 0.3447942668199565}, 'rouge-2': {'f': 0.175794098245, 'p': 0.22236168866719144, 'r': 0.16057201572100432}, 'rouge-l': {'f': 0.3806436944039715, 'p': 0.4789247600612308, 'r': 0.3390977899415061}} 2020-09-13 01:14:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 16 @ 2912 updates, score 5.701) (writing took 4.02212131395936 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3718838641380148, 'p': 0.4666792110932706, 'r': 0.3399888800206822}, 'rouge-2': {'f': 0.1671697426502001, 'p': 0.21140715464498486, 'r': 0.1542190531743976}, 'rouge-l': {'f': 0.3728099296900896, 'p': 0.46619239167445015, 'r': 0.3346634641171861}} 2020-09-13 01:18:54 | INFO | train | {"epoch": 17, "train_loss": "4.782", "train_nll_loss": "3.04", "train_ppl": "8.228", "train_wps": "1238.2", "train_ups": "0.58", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3094", "train_lr": "1.07868e-05", "train_gnorm": "2.405", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "5406"} 2020-09-13 01:18:58 | INFO | valid | {"epoch": 17, "valid_loss": "5.733", "valid_nll_loss": "3.974", "valid_ppl": "15.715", "valid_wps": "5150", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3094", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.391276670828434, 'p': 0.43821071505750875, 'r': 0.3908326539960555}, 'rouge-2': {'f': 0.17966333440452606, 'p': 0.2006399715420435, 'r': 0.18217533769836514}, 'rouge-l': {'f': 0.3948289271214795, 'p': 0.4524492153099438, 'r': 0.3798363085924027}} 2020-09-13 01:20:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 17 @ 3094 updates, score 5.733) (writing took 4.8997919876128435 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3845810681201655, 'p': 0.4282195186242171, 'r': 0.3862897923232961}, 'rouge-2': {'f': 0.17073755266846638, 'p': 0.1894686524531282, 'r': 0.17556790042355605}, 'rouge-l': {'f': 0.391043491686661, 'p': 0.4431799101178343, 'r': 0.3779568118927497}} 2020-09-13 01:24:28 | INFO | train | {"epoch": 18, "train_loss": "4.697", "train_nll_loss": "2.94", "train_ppl": "7.672", "train_wps": "1159.4", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3276", "train_lr": "1.04828e-05", "train_gnorm": "2.467", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "5740"} 2020-09-13 01:24:32 | INFO | valid | {"epoch": 18, "valid_loss": "5.724", "valid_nll_loss": "3.944", "valid_ppl": "15.396", "valid_wps": "5295.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3276", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.37861704558770404, 'p': 0.4419875194307351, 'r': 0.3667373863426889}, 'rouge-2': {'f': 0.1726066525356969, 'p': 0.20114080833632547, 'r': 0.1692294466791264}, 'rouge-l': {'f': 0.3817555654265253, 'p': 0.4502364007470248, 'r': 0.3588787943829516}} 2020-09-13 01:25:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 18 @ 3276 updates, score 5.724) (writing took 5.381191832944751 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3723413859817262, 'p': 0.4282642263502103, 'r': 0.36388886661332226}, 'rouge-2': {'f': 0.15908020930468886, 'p': 0.18290676480735674, 'r': 0.15773721634104743}, 'rouge-l': {'f': 0.3772182373129486, 'p': 0.4386399747551869, 'r': 0.35658496212946833}} 2020-09-13 01:29:54 | INFO | train | {"epoch": 19, "train_loss": "4.614", "train_nll_loss": "2.841", "train_ppl": "7.163", "train_wps": "1189.8", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3458", "train_lr": "1.02033e-05", "train_gnorm": "2.388", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "6066"} 2020-09-13 01:29:59 | INFO | valid | {"epoch": 19, "valid_loss": "5.742", "valid_nll_loss": "3.966", "valid_ppl": "15.631", "valid_wps": "3865.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3458", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38044592866113847, 'p': 0.46748203688066003, 'r': 0.355039752203741}, 'rouge-2': {'f': 0.17804838492656566, 'p': 0.21772699643126558, 'r': 0.16901485873684227}, 'rouge-l': {'f': 0.3813148465803212, 'p': 0.46892094755586644, 'r': 0.3488118629587613}} 2020-09-13 01:31:08 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 19 @ 3458 updates, score 5.742) (writing took 4.588620846159756 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3701339419496253, 'p': 0.45463960006384346, 'r': 0.3417336798726597}, 'rouge-2': {'f': 0.16203221978539503, 'p': 0.20101164160722118, 'r': 0.15035608008610543}, 'rouge-l': {'f': 0.3689392742120454, 'p': 0.4531052832313649, 'r': 0.3346368169251904}} 2020-09-13 01:35:14 | INFO | train | {"epoch": 20, "train_loss": "4.54", "train_nll_loss": "2.754", "train_ppl": "6.744", "train_wps": "1207.7", "train_ups": "0.57", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3640", "train_lr": "9.9449e-06", "train_gnorm": "2.43", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "6387"} 2020-09-13 01:35:19 | INFO | valid | {"epoch": 20, "valid_loss": "5.765", "valid_nll_loss": "3.981", "valid_ppl": "15.794", "valid_wps": "5212", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3640", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38917925586628854, 'p': 0.4545888729297281, 'r': 0.37521025150304493}, 'rouge-2': {'f': 0.18128938136060216, 'p': 0.2108173460566329, 'r': 0.1779577175289945}, 'rouge-l': {'f': 0.3912123475237053, 'p': 0.4626659366951105, 'r': 0.36707352170149965}} 2020-09-13 01:36:31 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 20 @ 3640 updates, score 5.765) (writing took 4.491002192720771 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37944542213327404, 'p': 0.4419097225174281, 'r': 0.36700773898625333}, 'rouge-2': {'f': 0.16679895064998046, 'p': 0.19638910895842057, 'r': 0.16331358354493733}, 'rouge-l': {'f': 0.3822408225204509, 'p': 0.4509045323855888, 'r': 0.3583453351231742}} 2020-09-13 01:40:40 | INFO | train | {"epoch": 21, "train_loss": "4.459", "train_nll_loss": "2.656", "train_ppl": "6.305", "train_wps": "1190.5", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3822", "train_lr": "9.70523e-06", "train_gnorm": "2.442", "train_clip": "0", "train_oom": "0", "train_train_wall": "168", "train_wall": "6712"} 2020-09-13 01:40:44 | INFO | valid | {"epoch": 21, "valid_loss": "5.801", "valid_nll_loss": "4.013", "valid_ppl": "16.145", "valid_wps": "5059.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3822", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3951976657153381, 'p': 0.4313811856628662, 'r': 0.40212658363768433}, 'rouge-2': {'f': 0.18158599410484844, 'p': 0.19714233842488113, 'r': 0.1880275069846484}, 'rouge-l': {'f': 0.400753788566494, 'p': 0.4453694229577254, 'r': 0.39373391050291107}} 2020-09-13 01:42:04 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 21 @ 3822 updates, score 5.801) (writing took 4.3676733430475 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37992031392670983, 'p': 0.4103137483489205, 'r': 0.39027553986147556}, 'rouge-2': {'f': 0.16588326288147193, 'p': 0.1786441782759031, 'r': 0.17327696142366736}, 'rouge-l': {'f': 0.3866994768252795, 'p': 0.4268482135450003, 'r': 0.38055574575484663}} 2020-09-13 01:46:22 | INFO | train | {"epoch": 22, "train_loss": "4.388", "train_nll_loss": "2.573", "train_ppl": "5.95", "train_wps": "1132.6", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4004", "train_lr": "9.48209e-06", "train_gnorm": "2.438", "train_clip": "0", "train_oom": "0", "train_train_wall": "169", "train_wall": "7054"} 2020-09-13 01:46:27 | INFO | valid | {"epoch": 22, "valid_loss": "5.806", "valid_nll_loss": "4.013", "valid_ppl": "16.15", "valid_wps": "4606.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4004", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38750722254797787, 'p': 0.4277637839583848, 'r': 0.3917518647792673}, 'rouge-2': {'f': 0.17892783275537408, 'p': 0.1954041191558916, 'r': 0.18563060651094296}, 'rouge-l': {'f': 0.3901464960798033, 'p': 0.4382336878278232, 'r': 0.38022299917227464}} 2020-09-13 01:47:44 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 22 @ 4004 updates, score 5.806) (writing took 4.484134818427265 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37365247137031243, 'p': 0.4107872063220731, 'r': 0.37916166737146184}, 'rouge-2': {'f': 0.1645194217180295, 'p': 0.18036856200061924, 'r': 0.17038522739655573}, 'rouge-l': {'f': 0.3788353883034097, 'p': 0.42472162650174466, 'r': 0.36942716034722173}} 2020-09-13 01:51:58 | INFO | train | {"epoch": 23, "train_loss": "4.321", "train_nll_loss": "2.493", "train_ppl": "5.628", "train_wps": "1151.3", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4186", "train_lr": "9.27367e-06", "train_gnorm": "2.455", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "171", "train_wall": "7391"} 2020-09-13 01:52:03 | INFO | valid | {"epoch": 23, "valid_loss": "5.827", "valid_nll_loss": "4.033", "valid_ppl": "16.372", "valid_wps": "4875", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4186", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38822926461557655, 'p': 0.45348295623984464, 'r': 0.37020477945877106}, 'rouge-2': {'f': 0.1777558814272791, 'p': 0.20734387979453298, 'r': 0.17123390284814333}, 'rouge-l': {'f': 0.3870378946705613, 'p': 0.453376818512756, 'r': 0.3616563152380114}} 2020-09-13 01:53:12 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 23 @ 4186 updates, score 5.827) (writing took 4.091774018481374 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3752148029208675, 'p': 0.43863114918235374, 'r': 0.358486619417161}, 'rouge-2': {'f': 0.16180559507864367, 'p': 0.19086761931101376, 'r': 0.1558113872763702}, 'rouge-l': {'f': 0.3749091393825165, 'p': 0.44034156816542647, 'r': 0.3498163907525991}} 2020-09-13 01:57:21 | INFO | train | {"epoch": 24, "train_loss": "4.255", "train_nll_loss": "2.414", "train_ppl": "5.331", "train_wps": "1200.3", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4368", "train_lr": "9.07841e-06", "train_gnorm": "2.447", "train_clip": "0", "train_oom": "0", "train_train_wall": "170", "train_wall": "7713"} 2020-09-13 01:57:25 | INFO | valid | {"epoch": 24, "valid_loss": "5.827", "valid_nll_loss": "4.02", "valid_ppl": "16.219", "valid_wps": "5117.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4368", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38534117547261393, 'p': 0.445273298289193, 'r': 0.3742189594129202}, 'rouge-2': {'f': 0.17705063825525105, 'p': 0.20385055188358236, 'r': 0.17461070647467383}, 'rouge-l': {'f': 0.38760590954771934, 'p': 0.45168003704103943, 'r': 0.36699590315532576}} 2020-09-13 01:58:41 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 24 @ 4368 updates, score 5.827) (writing took 4.580549734644592 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37376658505892546, 'p': 0.42513689289838574, 'r': 0.36824430078261083}, 'rouge-2': {'f': 0.16449381722326553, 'p': 0.18792195589835636, 'r': 0.16472545042139783}, 'rouge-l': {'f': 0.37640842528791896, 'p': 0.4302305671551292, 'r': 0.36077114610382716}} 2020-09-13 02:02:58 | INFO | train | {"epoch": 25, "train_loss": "4.189", "train_nll_loss": "2.336", "train_ppl": "5.049", "train_wps": "1148.2", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4550", "train_lr": "8.89499e-06", "train_gnorm": "2.448", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "8051"} 2020-09-13 02:03:03 | INFO | valid | {"epoch": 25, "valid_loss": "5.868", "valid_nll_loss": "4.082", "valid_ppl": "16.938", "valid_wps": "5108.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4550", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3871704147563257, 'p': 0.42776014371986826, 'r': 0.39233499649709835}, 'rouge-2': {'f': 0.1735141403257615, 'p': 0.18983819963280862, 'r': 0.1801092131406639}, 'rouge-l': {'f': 0.38881395036855076, 'p': 0.4373233829107774, 'r': 0.37913972883849767}} 2020-09-13 02:04:26 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 25 @ 4550 updates, score 5.868) (writing took 4.335143620148301 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3844215514195955, 'p': 0.42117882229965664, 'r': 0.38836392231731154}, 'rouge-2': {'f': 0.16712838914507203, 'p': 0.184210597325008, 'r': 0.16991914128454474}, 'rouge-l': {'f': 0.38612055818280827, 'p': 0.43107652154520104, 'r': 0.37593315070715955}} 2020-09-13 02:08:44 | INFO | train | {"epoch": 26, "train_loss": "4.123", "train_nll_loss": "2.257", "train_ppl": "4.78", "train_wps": "1122.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4732", "train_lr": "8.72226e-06", "train_gnorm": "2.424", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "8396"} 2020-09-13 02:08:48 | INFO | valid | {"epoch": 26, "valid_loss": "5.882", "valid_nll_loss": "4.086", "valid_ppl": "16.987", "valid_wps": "5032.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4732", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.39059463228636443, 'p': 0.44845589456484236, 'r': 0.37968675708755983}, 'rouge-2': {'f': 0.17511335211715656, 'p': 0.19906234650600044, 'r': 0.1739893469380071}, 'rouge-l': {'f': 0.3879455405786022, 'p': 0.4496592808236638, 'r': 0.3675400057841032}} 2020-09-13 02:10:03 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 26 @ 4732 updates, score 5.882) (writing took 4.394494019448757 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37693971962065465, 'p': 0.43127100071839813, 'r': 0.37002592342998164}, 'rouge-2': {'f': 0.1637296222980477, 'p': 0.187921960463544, 'r': 0.16291519359218756}, 'rouge-l': {'f': 0.37658267706833715, 'p': 0.43598237022943104, 'r': 0.3580529888703775}} 2020-09-13 02:14:15 | INFO | train | {"epoch": 27, "train_loss": "4.067", "train_nll_loss": "2.192", "train_ppl": "4.569", "train_wps": "1170.3", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4914", "train_lr": "8.55921e-06", "train_gnorm": "2.464", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "8727"} 2020-09-13 02:14:19 | INFO | valid | {"epoch": 27, "valid_loss": "5.915", "valid_nll_loss": "4.113", "valid_ppl": "17.299", "valid_wps": "5272.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4914", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38434759891377074, 'p': 0.4249833165393893, 'r': 0.38662224677719503}, 'rouge-2': {'f': 0.17192755651144725, 'p': 0.19044749867481733, 'r': 0.1749212887993873}, 'rouge-l': {'f': 0.38774917991430324, 'p': 0.4365569133041221, 'r': 0.37632971013264493}} 2020-09-13 02:15:35 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 27 @ 4914 updates, score 5.915) (writing took 4.622201276943088 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37903839501209047, 'p': 0.4130527827785208, 'r': 0.38674322772887626}, 'rouge-2': {'f': 0.16408303228928034, 'p': 0.17857210979076796, 'r': 0.169772298030173}, 'rouge-l': {'f': 0.3828725069419752, 'p': 0.42540750664994087, 'r': 0.3750088333333603}} 2020-09-13 02:19:56 | INFO | train | {"epoch": 28, "train_loss": "4.005", "train_nll_loss": "2.117", "train_ppl": "4.339", "train_wps": "1133.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5096", "train_lr": "8.40498e-06", "train_gnorm": "2.417", "train_clip": "0", "train_oom": "0", "train_train_wall": "175", "train_wall": "9069"} 2020-09-13 02:20:01 | INFO | valid | {"epoch": 28, "valid_loss": "5.924", "valid_nll_loss": "4.113", "valid_ppl": "17.301", "valid_wps": "5072.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5096", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3789412021753395, 'p': 0.44009502943218975, 'r': 0.3673566576087134}, 'rouge-2': {'f': 0.172794589968652, 'p': 0.19937717288075993, 'r': 0.17077481830782684}, 'rouge-l': {'f': 0.37829677515222787, 'p': 0.44274850813900934, 'r': 0.3577681743468191}} 2020-09-13 02:21:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 28 @ 5096 updates, score 5.924) (writing took 4.540925501845777 seconds) Test on testing set: Test {'rouge-1': {'f': 0.36527019423752055, 'p': 0.4208007754021258, 'r': 0.35525130146740025}, 'rouge-2': {'f': 0.1572877265691285, 'p': 0.1817244103207206, 'r': 0.15514638974429046}, 'rouge-l': {'f': 0.36299730631095184, 'p': 0.42155101744468565, 'r': 0.34371323149546557}} 2020-09-13 02:25:31 | INFO | train | {"epoch": 29, "train_loss": "3.946", "train_nll_loss": "2.047", "train_ppl": "4.134", "train_wps": "1157.1", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5278", "train_lr": "8.25879e-06", "train_gnorm": "2.41", "train_clip": "0", "train_oom": "0", "train_train_wall": "176", "train_wall": "9403"} 2020-09-13 02:25:35 | INFO | valid | {"epoch": 29, "valid_loss": "5.948", "valid_nll_loss": "4.141", "valid_ppl": "17.646", "valid_wps": "5013.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5278", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.39392286385154907, 'p': 0.44045543067367227, 'r': 0.39267143683548494}, 'rouge-2': {'f': 0.17773576677957223, 'p': 0.1967579777648653, 'r': 0.18110397051020538}, 'rouge-l': {'f': 0.3941220431573893, 'p': 0.44562952698546576, 'r': 0.3812440137558409}} 2020-09-13 02:26:56 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 29 @ 5278 updates, score 5.948) (writing took 4.303143389523029 seconds) Test on testing set: Test {'rouge-1': {'f': 0.38080278641895143, 'p': 0.4233969204218693, 'r': 0.3843529679579355}, 'rouge-2': {'f': 0.1626977341974013, 'p': 0.18099048848906363, 'r': 0.16602780449388885}, 'rouge-l': {'f': 0.38036795751367164, 'p': 0.43089260804423674, 'r': 0.369552310419994}} 2020-09-13 02:31:11 | INFO | train | {"epoch": 30, "train_loss": "3.895", "train_nll_loss": "1.988", "train_ppl": "3.966", "train_wps": "1139.3", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5460", "train_lr": "8.11998e-06", "train_gnorm": "2.409", "train_clip": "0", "train_oom": "0", "train_train_wall": "170", "train_wall": "9743"} 2020-09-13 02:31:16 | INFO | valid | {"epoch": 30, "valid_loss": "5.981", "valid_nll_loss": "4.171", "valid_ppl": "18.011", "valid_wps": "4936.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5460", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38020709374000805, 'p': 0.425796857846852, 'r': 0.37872173949599347}, 'rouge-2': {'f': 0.16795702651937106, 'p': 0.18762125339060082, 'r': 0.17083357743421537}, 'rouge-l': {'f': 0.38090418992038316, 'p': 0.4310409715465135, 'r': 0.36867556519723077}} 2020-09-13 02:32:29 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 30 @ 5460 updates, score 5.981) (writing took 4.3459603087976575 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37965358788271986, 'p': 0.4206577074238292, 'r': 0.380269873971165}, 'rouge-2': {'f': 0.16305562269169338, 'p': 0.18144303909345297, 'r': 0.16565712858345996}, 'rouge-l': {'f': 0.3809386293041335, 'p': 0.4267185372492751, 'r': 0.3700918835000441}} 2020-09-13 02:36:41 | INFO | train | {"epoch": 31, "train_loss": "3.83", "train_nll_loss": "1.911", "train_ppl": "3.76", "train_wps": "1174.4", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5642", "train_lr": "7.98794e-06", "train_gnorm": "2.399", "train_clip": "0", "train_oom": "0", "train_train_wall": "169", "train_wall": "10073"} 2020-09-13 02:36:47 | INFO | valid | {"epoch": 31, "valid_loss": "5.991", "valid_nll_loss": "4.188", "valid_ppl": "18.225", "valid_wps": "3896.5", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5642", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38086083577746893, 'p': 0.43856192527978594, 'r': 0.36852514930253355}, 'rouge-2': {'f': 0.1720548694357604, 'p': 0.19719350666715524, 'r': 0.1688531110170121}, 'rouge-l': {'f': 0.37940756909649354, 'p': 0.43991824611795655, 'r': 0.3593013025166184}} 2020-09-13 02:38:05 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 31 @ 5642 updates, score 5.991) (writing took 4.520576075650752 seconds) Test on testing set: Test {'rouge-1': {'f': 0.370679717039701, 'p': 0.4252353421116953, 'r': 0.362809943313995}, 'rouge-2': {'f': 0.15675368799422942, 'p': 0.18101486674427333, 'r': 0.15407280461847872}, 'rouge-l': {'f': 0.3695017426852557, 'p': 0.4269730776434452, 'r': 0.3521532197453053}} 2020-09-13 02:42:13 | INFO | train | {"epoch": 32, "train_loss": "3.782", "train_nll_loss": "1.855", "train_ppl": "3.617", "train_wps": "1166.1", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5824", "train_lr": "7.86214e-06", "train_gnorm": "2.394", "train_clip": "0", "train_oom": "0", "train_train_wall": "166", "train_wall": "10405"} 2020-09-13 02:42:17 | INFO | valid | {"epoch": 32, "valid_loss": "6", "valid_nll_loss": "4.194", "valid_ppl": "18.306", "valid_wps": "5229.5", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5824", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38137652179916004, 'p': 0.43313790455810236, 'r': 0.3772229819910243}, 'rouge-2': {'f': 0.16691298025795592, 'p': 0.18836874967588021, 'r': 0.16863772276161515}, 'rouge-l': {'f': 0.38013909271578145, 'p': 0.4366292868501368, 'r': 0.3653993312593402}} 2020-09-13 02:43:34 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 32 @ 5824 updates, score 6.0) (writing took 4.351189863868058 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37981067693153064, 'p': 0.42971224310521866, 'r': 0.37687873722717913}, 'rouge-2': {'f': 0.15992774092760015, 'p': 0.18342171362162557, 'r': 0.15837989344554051}, 'rouge-l': {'f': 0.3799422632878725, 'p': 0.43464593193037043, 'r': 0.36586217115314146}} 2020-09-13 02:47:51 | INFO | train | {"epoch": 33, "train_loss": "3.726", "train_nll_loss": "1.789", "train_ppl": "3.455", "train_wps": "1147.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6006", "train_lr": "7.7421e-06", "train_gnorm": "2.331", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "10743"} 2020-09-13 02:47:55 | INFO | valid | {"epoch": 33, "valid_loss": "6.02", "valid_nll_loss": "4.212", "valid_ppl": "18.537", "valid_wps": "5786.6", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6006", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.37915361599396163, 'p': 0.4177151715958892, 'r': 0.38225942974637367}, 'rouge-2': {'f': 0.16823645287535424, 'p': 0.18276700397376724, 'r': 0.17357744357848057}, 'rouge-l': {'f': 0.38136667391475326, 'p': 0.4250451610297481, 'r': 0.3732515439655231}} 2020-09-13 02:49:11 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 33 @ 6006 updates, score 6.02) (writing took 4.523772260174155 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3718742933369427, 'p': 0.4099806920967013, 'r': 0.3767762931178083}, 'rouge-2': {'f': 0.15370868260410603, 'p': 0.17047928337500037, 'r': 0.15684602569324277}, 'rouge-l': {'f': 0.3688537640181489, 'p': 0.41053246892705925, 'r': 0.3627296345795084}} 2020-09-13 02:53:29 | INFO | train | {"epoch": 34, "train_loss": "3.669", "train_nll_loss": "1.723", "train_ppl": "3.3", "train_wps": "1146.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6188", "train_lr": "7.62739e-06", "train_gnorm": "2.323", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "11081"} 2020-09-13 02:53:33 | INFO | valid | {"epoch": 34, "valid_loss": "6.066", "valid_nll_loss": "4.269", "valid_ppl": "19.278", "valid_wps": "5174.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6188", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.387153766867382, 'p': 0.42792687135337976, 'r': 0.3892787185709572}, 'rouge-2': {'f': 0.17028365573313206, 'p': 0.1865320310783622, 'r': 0.17456691285290213}, 'rouge-l': {'f': 0.3850822596116743, 'p': 0.43329137654837147, 'r': 0.3746488459567868}} 2020-09-13 02:54:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 34 @ 6188 updates, score 6.066) (writing took 4.374015459790826 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37794691927940804, 'p': 0.4146940259079912, 'r': 0.3847413121249158}, 'rouge-2': {'f': 0.1627020618517039, 'p': 0.17883773270864975, 'r': 0.1677699635770461}, 'rouge-l': {'f': 0.38046893390803, 'p': 0.4244051828628842, 'r': 0.37318182299953945}} 2020-09-13 02:59:05 | INFO | train | {"epoch": 35, "train_loss": "3.621", "train_nll_loss": "1.667", "train_ppl": "3.175", "train_wps": "1152.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6370", "train_lr": "7.51764e-06", "train_gnorm": "2.332", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "11417"} 2020-09-13 02:59:10 | INFO | valid | {"epoch": 35, "valid_loss": "6.081", "valid_nll_loss": "4.28", "valid_ppl": "19.431", "valid_wps": "4026.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6370", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3863770618665606, 'p': 0.41960423765025195, 'r': 0.39353742674033426}, 'rouge-2': {'f': 0.16849963347107963, 'p': 0.18116140757543353, 'r': 0.1752595358690957}, 'rouge-l': {'f': 0.3868700708130419, 'p': 0.4284176759227474, 'r': 0.38066459517190543}} 2020-09-13 03:00:35 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 35 @ 6370 updates, score 6.081) (writing took 4.590174483135343 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3785464402442006, 'p': 0.4101153566021953, 'r': 0.3892325582359251}, 'rouge-2': {'f': 0.1635688031477682, 'p': 0.17727291271497997, 'r': 0.17079954619481266}, 'rouge-l': {'f': 0.38081945358634506, 'p': 0.4199370914132296, 'r': 0.37711290928707314}} 2020-09-13 03:04:57 | INFO | train | {"epoch": 36, "train_loss": "3.576", "train_nll_loss": "1.613", "train_ppl": "3.059", "train_wps": "1101.4", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6552", "train_lr": "7.41249e-06", "train_gnorm": "2.297", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "11769"} 2020-09-13 03:05:01 | INFO | valid | {"epoch": 36, "valid_loss": "6.083", "valid_nll_loss": "4.281", "valid_ppl": "19.438", "valid_wps": "4978.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6552", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38480255056640067, 'p': 0.41896340222175316, 'r': 0.3925201975546814}, 'rouge-2': {'f': 0.16801251321048902, 'p': 0.18062486712904227, 'r': 0.17507848305413856}, 'rouge-l': {'f': 0.382260686554629, 'p': 0.42213632057966827, 'r': 0.37822787809887537}} 2020-09-13 03:06:23 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 36 @ 6552 updates, score 6.083) (writing took 4.2699774550274014 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37602470396509274, 'p': 0.411255046807531, 'r': 0.38215708009749216}, 'rouge-2': {'f': 0.1597641638757478, 'p': 0.17617491833186072, 'r': 0.16325763230195292}, 'rouge-l': {'f': 0.37238202458966757, 'p': 0.41385589088931496, 'r': 0.36541661129126923}} 2020-09-13 03:10:43 | INFO | train | {"epoch": 37, "train_loss": "3.52", "train_nll_loss": "1.549", "train_ppl": "2.925", "train_wps": "1118.5", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6734", "train_lr": "7.31164e-06", "train_gnorm": "2.273", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "12115"} 2020-09-13 03:10:46 | INFO | valid | {"epoch": 37, "valid_loss": "6.094", "valid_nll_loss": "4.296", "valid_ppl": "19.645", "valid_wps": "6031.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6734", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38520799197303024, 'p': 0.4115952466989357, 'r': 0.401465322621547}, 'rouge-2': {'f': 0.16788612832087196, 'p': 0.17883063755349102, 'r': 0.1782279060988303}, 'rouge-l': {'f': 0.3861095337034907, 'p': 0.4201549510855608, 'r': 0.3877350945267522}} 2020-09-13 03:12:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 37 @ 6734 updates, score 6.094) (writing took 4.625352236442268 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3743512324371333, 'p': 0.3969204019868071, 'r': 0.3928996754216558}, 'rouge-2': {'f': 0.15513973954352014, 'p': 0.1640198600019053, 'r': 0.1656530221490409}, 'rouge-l': {'f': 0.3754077971732901, 'p': 0.4063686794601194, 'r': 0.3776842747295159}} 2020-09-13 03:16:43 | INFO | train | {"epoch": 38, "train_loss": "3.475", "train_nll_loss": "1.497", "train_ppl": "2.822", "train_wps": "1074.9", "train_ups": "0.51", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6916", "train_lr": "7.21479e-06", "train_gnorm": "2.282", "train_clip": "0", "train_oom": "0", "train_train_wall": "176", "train_wall": "12476"} 2020-09-13 03:16:48 | INFO | valid | {"epoch": 38, "valid_loss": "6.091", "valid_nll_loss": "4.292", "valid_ppl": "19.588", "valid_wps": "4328.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6916", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3906977486823824, 'p': 0.4322244472662039, 'r': 0.3904387899271124}, 'rouge-2': {'f': 0.17065932983497842, 'p': 0.18826351149371626, 'r': 0.17272125641072303}, 'rouge-l': {'f': 0.3853463228641199, 'p': 0.43086111541067484, 'r': 0.37498756649126375}} 2020-09-13 03:18:01 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 38 @ 6916 updates, score 6.091) (writing took 4.299341707490385 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3778648195003795, 'p': 0.4124395622043999, 'r': 0.38638326843135784}, 'rouge-2': {'f': 0.15725456356423087, 'p': 0.17166198095409285, 'r': 0.16285370981177014}, 'rouge-l': {'f': 0.3730423527751766, 'p': 0.4130845975685863, 'r': 0.36802060468026837}} 2020-09-13 03:22:20 | INFO | train | {"epoch": 39, "train_loss": "3.431", "train_nll_loss": "1.444", "train_ppl": "2.721", "train_wps": "1149.8", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7098", "train_lr": "7.12169e-06", "train_gnorm": "2.279", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "12812"} 2020-09-13 03:22:24 | INFO | valid | {"epoch": 39, "valid_loss": "6.127", "valid_nll_loss": "4.333", "valid_ppl": "20.159", "valid_wps": "6119.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7098", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3850664930124747, 'p': 0.41624993500454843, 'r': 0.39500751417890606}, 'rouge-2': {'f': 0.16650493821705545, 'p': 0.17903338552596454, 'r': 0.1734290759497076}, 'rouge-l': {'f': 0.38333622366926445, 'p': 0.4211383929772211, 'r': 0.38013266010559055}} 2020-09-13 03:23:47 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 39 @ 7098 updates, score 6.127) (writing took 4.8811688451096416 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3791954685119963, 'p': 0.4095449415666187, 'r': 0.38998344500957005}, 'rouge-2': {'f': 0.1587988932718357, 'p': 0.1721871158793693, 'r': 0.16477044960993284}, 'rouge-l': {'f': 0.3762471094410884, 'p': 0.4124276670798772, 'r': 0.3733499950158345}} 2020-09-13 03:28:07 | INFO | train | {"epoch": 40, "train_loss": "3.382", "train_nll_loss": "1.389", "train_ppl": "2.618", "train_wps": "1116.7", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7280", "train_lr": "7.03211e-06", "train_gnorm": "2.242", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "13159"} 2020-09-13 03:28:13 | INFO | valid | {"epoch": 40, "valid_loss": "6.151", "valid_nll_loss": "4.357", "valid_ppl": "20.485", "valid_wps": "3936.6", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7280", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3890387077350804, 'p': 0.42607329008636396, 'r': 0.39286505207167427}, 'rouge-2': {'f': 0.1711647526477222, 'p': 0.18699617326343265, 'r': 0.17517819545443322}, 'rouge-l': {'f': 0.3875280498801758, 'p': 0.42916514940329664, 'r': 0.3812118893956973}} 2020-09-13 03:29:30 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 40 @ 7280 updates, score 6.151) (writing took 4.4904003804549575 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3726439693825235, 'p': 0.40527439922433883, 'r': 0.38066306967102415}, 'rouge-2': {'f': 0.1534322184436473, 'p': 0.16804306661481672, 'r': 0.15854620618257684}, 'rouge-l': {'f': 0.3707951982072947, 'p': 0.40862711990312467, 'r': 0.36647558059887014}} 2020-09-13 03:33:52 | INFO | train | {"epoch": 41, "train_loss": "3.35", "train_nll_loss": "1.351", "train_ppl": "2.551", "train_wps": "1122.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7462", "train_lr": "6.94582e-06", "train_gnorm": "2.213", "train_clip": "0", "train_oom": "0", "train_train_wall": "177", "train_wall": "13504"} 2020-09-13 03:33:58 | INFO | valid | {"epoch": 41, "valid_loss": "6.181", "valid_nll_loss": "4.391", "valid_ppl": "20.984", "valid_wps": "4025.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7462", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3843756468861572, 'p': 0.41117690198532353, 'r': 0.39729475731101416}, 'rouge-2': {'f': 0.16308450642519431, 'p': 0.17295082127690298, 'r': 0.17102846873431943}, 'rouge-l': {'f': 0.37910076864649406, 'p': 0.4103915731589071, 'r': 0.3803033642081009}} 2020-09-13 03:35:23 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 41 @ 7462 updates, score 6.181) (writing took 4.650649065151811 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3769917556036458, 'p': 0.3944497389851229, 'r': 0.40075455946786587}, 'rouge-2': {'f': 0.1521325700581021, 'p': 0.1589907345850074, 'r': 0.16469397096833938}, 'rouge-l': {'f': 0.3697896732897443, 'p': 0.39433115895849924, 'r': 0.37772640423431814}} 2020-09-13 03:39:50 | INFO | train | {"epoch": 42, "train_loss": "3.311", "train_nll_loss": "1.307", "train_ppl": "2.473", "train_wps": "1084", "train_ups": "0.51", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7644", "train_lr": "6.86264e-06", "train_gnorm": "2.195", "train_clip": "0", "train_oom": "0", "train_train_wall": "179", "train_wall": "13862"} 2020-09-13 03:39:54 | INFO | valid | {"epoch": 42, "valid_loss": "6.192", "valid_nll_loss": "4.401", "valid_ppl": "21.12", "valid_wps": "4981.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7644", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.3839548844216928, 'p': 0.40919695990657146, 'r': 0.39678524748658583}, 'rouge-2': {'f': 0.16309939507844493, 'p': 0.1717916669360824, 'r': 0.17226327704521602}, 'rouge-l': {'f': 0.38036248312741655, 'p': 0.4108766577386286, 'r': 0.381103948466934}} 2020-09-13 03:41:25 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 42 @ 7644 updates, score 6.192) (writing took 4.394819853827357 seconds) Test on testing set: Test {'rouge-1': {'f': 0.37688485075539657, 'p': 0.39846735624071966, 'r': 0.3945277424880961}, 'rouge-2': {'f': 0.1555117580701479, 'p': 0.1640307380388293, 'r': 0.16614039633603137}, 'rouge-l': {'f': 0.37278658003532866, 'p': 0.4006159776983299, 'r': 0.37631725030144797}} 2020-09-13 03:45:51 | INFO | train | {"epoch": 43, "train_loss": "3.28", "train_nll_loss": "1.27", "train_ppl": "2.412", "train_wps": "1070.6", "train_ups": "0.5", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7826", "train_lr": "6.78237e-06", "train_gnorm": "2.213", "train_clip": "0", "train_oom": "0", "train_train_wall": "177", "train_wall": "14224"} 2020-09-13 03:45:55 | INFO | valid | {"epoch": 43, "valid_loss": "6.194", "valid_nll_loss": "4.405", "valid_ppl": "21.181", "valid_wps": "5927.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7826", "valid_best_loss": "5.695"} here bpe NONE here! Val {'rouge-1': {'f': 0.38388910329361303, 'p': 0.4258477894016644, 'r': 0.3848852055702036}, 'rouge-2': {'f': 0.16835026873515999, 'p': 0.18584556947743405, 'r': 0.17154795199389153}, 'rouge-l': {'f': 0.3794284964164359, 'p': 0.42386850952855576, 'r': 0.37060369000872206}} 2020-09-13 03:47:31 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 43 @ 7826 updates, score 6.194) (writing took 4.312541832216084 seconds) Test on testing set: Test {'rouge-1': {'f': 0.3649928682374293, 'p': 0.40100515797443215, 'r': 0.37178177746836855}, 'rouge-2': {'f': 0.14818549602209022, 'p': 0.16269601833548097, 'r': 0.15371445974590012}, 'rouge-l': {'f': 0.3619876909686621, 'p': 0.40180170267059645, 'r': 0.35681259215856076}} 2020-09-13 03:48:41 | INFO | fairseq_cli.train | early stop since valid performance hasn't improved for last 30 runs 2020-09-13 03:48:41 | INFO | fairseq_cli.train | done training in 14391.9 seconds

I think I figure out the reason, if you did not download the pre-trained model in the folder, the model is not initialized with pre-trained BART, instead, they are going to be randomly initialized.

Please download the pre-trained BART here (https://github.com/pytorch/fairseq/tree/master/examples/bart)

as shown in your log:

2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt

that's probably the reason why your results are pretty low.

Good catch. You are right. I'm training the single view model and the results seem to match those reported in the paper. Thanks for the help.

Test on val set: 
100% 817/817 [03:08<00:00,  4.33it/s]
Val {'rouge-1': {'f': 0.47053820487934117, 'p': 0.481068078158503, 'r': 0.5012747517270539}, 'rouge-2': {'f': 0.23280899121248622, 'p': 0.23762821988807867, 'r': 0.2502566730166665}, 'rouge-l': {'f': 0.45843104678080715, 'p': 0.4705976576858032, 'r': 0.47959589277465375}}
2020-11-06 21:23:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_stage/checkpoint_best.pt (epoch 1 @ 93 updates, score 4.057) (writing took 234.77840801099956 seconds)
Test on testing set: 
100% 818/818 [03:15<00:00,  4.19it/s]
Test {'rouge-1': {'f': 0.46512253633774703, 'p': 0.4772971625389979, 'r': 0.4974225331330478}, 'rouge-2': {'f': 0.2247942720566339, 'p': 0.23095709935043798, 'r': 0.24239651268780865}, 'rouge-l': {'f': 0.452616026351333, 'p': 0.46413084533332033, 'r': 0.47522827237678494}}
epoch 002:  73% 68/93 [14:01<05:11, 12.46s/it, loss=4.098, nll_loss=2.26, ppl=4.791, wps=191.4, ups=0.05, wpb=4184.4, bsz=160.2, num_updates=161, lr=2.415e-05, gnorm=2.919, clip=100, oom=0, train_wall=833, wall=2710]

as shown in your log:

2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt

I have the same problom,but I hava downloaded 'model.pt'file in /content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt why can not find

SALT-NLP / Multi-View-Seq2Seq

Unable to replicate results reported in the paper? #2