Hannibal046 / NMT_with_contrastive_memories

[EMNLP2022] Source code for Neural Machine Translation with Contrastive Translation Memories
11 stars 1 forks source link

The BLEU score on the test set En->De #1

Closed vhientran closed 1 year ago

vhientran commented 1 year ago

Hi @Hannibal046 ,

Thank you very much for your interesting work and releasing the source code! I followed your instruction to reproduce the reported results, using both ready-to-go data and ready-to-go memory you provided. Unfortunately, when evaluating the model on the test set En->De (as default in your code), I got the BLEU score: 54.06, which is significantly smaller than the reported BLEU score 58.69 in your paper at Table 2.

Could you please help me to resolve this problem? I have not changed any source code yet. Thank you so much for your time and your help!

Some results printed out when reproducing...

train metrics epoch = 4.97 train_loss = 2.6572 train_runtime = 5:54:07.95 train_samples = 663486 train_samples_per_second = 23531.676 train_steps_per_second = 2.353 predict metrics predict_bleu = 54.0698 predict_loss = 2.407 predict_runtime = 0:03:10.38 predict_samples = 2483 predict_samples_per_second = 13.042 predict_steps_per_second = 0.657

Hannibal046 commented 1 year ago

This result is not expected. It is even lower than vanilla Transformer baseline. Why the training epoch here is 4.97? Did the model converge?

vhientran commented 1 year ago

Thank you very much for your quick reply! I understand that the model used the checkpoint with the best performance on the dev set to evaluate on the test set.

Here is the full output file. Please help me to consider it.

`DataArgs(dataset_dir_prefix='data/', dataset_path='jrc_joint_bpe/ende', train_file='data/jrc_joint_bpe/ende/train.json', dev_file='data/jrc_joint_bpe/ende/dev.json', test_file='data/jrc_joint_bpe/ende/test.json', use_cache=False, max_src_len=250, max_trg_len=118, min_trg_len=3, src_vocab_file='data/jrc_joint_bpe/ende/src.vocab', trg_vocab_file='data/jrc_joint_bpe/ende/tgt.vocab', src='en', trg='de') MarianConfig { "activation_dropout": 0.0, "activation_function": "relu", "attention_dropout": 0.0, "bos_token_id": 3, "classifier_dropout": 0.0, "contrastive_lambda": 1, "contrastive_loss_balance": false, "contrastive_temperature": 0.15, "d_model": 512, "decoder_attention_heads": 8, "decoder_ffn_dim": 2048, "decoder_layerdrop": 0.0, "decoder_layers": 6, "decoder_start_token_id": 3, "decoder_type": "dual_cross_attention", "dropout": 0.1, "encoder_attention_heads": 8, "encoder_ffn_dim": 2048, "encoder_layerdrop": 0.0, "encoder_layers": 6, "eos_token_id": 2, "forced_eos_token_id": 2, "gradient_checkpointing": false, "init_std": 0.02, "is_encoder_decoder": true, "max_length": 118, "max_position_embeddings": 1024, "max_src_len": 250, "max_tm_len": 500, "max_trg_len": 118, "min_trg_len": 3, "model_arch": "retrieval_augmented", "model_type": "marian", "num_beams": 5, "num_hidden_layers": 6, "output_attentions": true, "output_hidden_states": true, "pad_token_id": 1, "pooler_type": "cls_mlp", "scale_embedding": true, "src_vocab_size": 0, "tm_encoder_attention_heads": 8, "tm_encoder_dropout": 0.0, "tm_encoder_ffn_dim": 2048, "tm_encoder_layers": 6, "tm_encoder_type": "group_attention", "tm_size": 5, "transformers_version": "4.9.0", "trg_vocab_size": 0, "use_cache": true, "use_contrastive": true, "use_copy": true, "use_joint_bpe": true, "use_shared_encoder": true, "vocab_size": 50265 }

TrainingArgs( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_steps=2000, evaluation_strategy=IntervalStrategy.STEPS, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=01, gradient_accumulation_steps=1, greater_is_better=True, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=True, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=results/jrc/ende/dual/runs/Jul03_23-01-27_sccdlb032, logging_first_step=True, logging_steps=100, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=50000, metric_for_best_model=bleu, mp_parameters=, multiple_loss=True, no_cuda=False, num_train_epochs=20, output_dir=results/jrc/ende/6657, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=20, per_device_train_batch_size=10000, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=dual, push_to_hub_organization=None, push_to_hub_token=None, remove_unused_columns=False, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=results/jrc/ende/dual, save_on_each_node=False, save_steps=2000, save_strategy=IntervalStrategy.STEPS, save_total_limit=2, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=8000, weight_decay=0.0, ) Initializing Model... data/jrc_joint_bpe/ende/train.json Dataset Samples: 663487 data/retrieval/ende/src_editdis_alpha_0.7.pkl 663487 Dataset Samples After filtering: 663486 Average Senteces in One Batch: 65.97911694510739 data/jrc_joint_bpe/ende/dev.json Dataset Samples: 2454 data/retrieval/ende/src_editdis_alpha_0.7.pkl 2454 Dataset Samples After filtering: 2454 data/jrc_joint_bpe/ende/test.json Dataset Samples: 2483 data/retrieval/ende/src_editdis_alpha_0.7.pkl 2483 Dataset Samples After filtering: 2483 Loading Complete {'Total_loss': 8.5871, 'Contrastive_loss': 1.9092, 'CrossEntropy_loss': 6.678, 'epoch': 0.0} {'Total_loss': 7.9695, 'Contrastive_loss': 1.2864, 'CrossEntropy_loss': 6.6831, 'epoch': 0.01} {'Total_loss': 6.9269, 'Contrastive_loss': 0.5813, 'CrossEntropy_loss': 6.3456, 'epoch': 0.02} {'Total_loss': 6.6431, 'Contrastive_loss': 0.4108, 'CrossEntropy_loss': 6.2323, 'epoch': 0.03} {'Total_loss': 6.1458, 'Contrastive_loss': 0.3267, 'CrossEntropy_loss': 5.8191, 'epoch': 0.04} {'Total_loss': 5.9742, 'Contrastive_loss': 0.2718, 'CrossEntropy_loss': 5.7024, 'epoch': 0.05} {'Total_loss': 5.7876, 'Contrastive_loss': 0.2481, 'CrossEntropy_loss': 5.5395, 'epoch': 0.06} {'Total_loss': 5.5869, 'Contrastive_loss': 0.2276, 'CrossEntropy_loss': 5.3592, 'epoch': 0.07} {'Total_loss': 5.5471, 'Contrastive_loss': 0.2232, 'CrossEntropy_loss': 5.3239, 'epoch': 0.08} {'Total_loss': 5.3336, 'Contrastive_loss': 0.2098, 'CrossEntropy_loss': 5.1238, 'epoch': 0.09} {'Total_loss': 5.2709, 'Contrastive_loss': 0.2043, 'CrossEntropy_loss': 5.0666, 'epoch': 0.1} {'Total_loss': 5.3056, 'Contrastive_loss': 0.2016, 'CrossEntropy_loss': 5.104, 'epoch': 0.11} {'Total_loss': 5.1, 'Contrastive_loss': 0.1957, 'CrossEntropy_loss': 4.9043, 'epoch': 0.12} {'Total_loss': 5.0497, 'Contrastive_loss': 0.1977, 'CrossEntropy_loss': 4.852, 'epoch': 0.13} {'Total_loss': 4.9868, 'Contrastive_loss': 0.1957, 'CrossEntropy_loss': 4.7911, 'epoch': 0.14} {'Total_loss': 4.9448, 'Contrastive_loss': 0.1934, 'CrossEntropy_loss': 4.7514, 'epoch': 0.15} {'Total_loss': 4.799, 'Contrastive_loss': 0.1931, 'CrossEntropy_loss': 4.606, 'epoch': 0.16} {'Total_loss': 4.845, 'Contrastive_loss': 0.1903, 'CrossEntropy_loss': 4.6547, 'epoch': 0.17} {'Total_loss': 4.7578, 'Contrastive_loss': 0.19, 'CrossEntropy_loss': 4.5678, 'epoch': 0.18} {'Total_loss': 4.6073, 'Contrastive_loss': 0.1926, 'CrossEntropy_loss': 4.4147, 'epoch': 0.19} {'Total_loss': 4.6527, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 4.4628, 'epoch': 0.2} {'eval_loss': 4.475997447967529, 'eval_bleu': 0.0, 'eval_runtime': 47.0383, 'eval_samples_per_second': 52.17, 'eval_steps_per_second': 2.615, 'epoch': 0.2} {'Total_loss': 4.5412, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 4.3512, 'epoch': 0.21} {'Total_loss': 4.4725, 'Contrastive_loss': 0.19, 'CrossEntropy_loss': 4.2825, 'epoch': 0.22} {'Total_loss': 4.4605, 'Contrastive_loss': 0.1875, 'CrossEntropy_loss': 4.2731, 'epoch': 0.23} {'Total_loss': 4.3126, 'Contrastive_loss': 0.1904, 'CrossEntropy_loss': 4.1222, 'epoch': 0.24} {'Total_loss': 4.3089, 'Contrastive_loss': 0.188, 'CrossEntropy_loss': 4.1209, 'epoch': 0.25} {'Total_loss': 4.2178, 'Contrastive_loss': 0.1883, 'CrossEntropy_loss': 4.0296, 'epoch': 0.26} {'Total_loss': 4.1015, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 3.9155, 'epoch': 0.27} {'Total_loss': 4.1292, 'Contrastive_loss': 0.1886, 'CrossEntropy_loss': 3.9406, 'epoch': 0.28} {'Total_loss': 4.118, 'Contrastive_loss': 0.1885, 'CrossEntropy_loss': 3.9295, 'epoch': 0.29} {'Total_loss': 4.1581, 'Contrastive_loss': 0.1857, 'CrossEntropy_loss': 3.9724, 'epoch': 0.3} {'Total_loss': 3.9412, 'Contrastive_loss': 0.1883, 'CrossEntropy_loss': 3.7529, 'epoch': 0.31} {'Total_loss': 3.9274, 'Contrastive_loss': 0.1864, 'CrossEntropy_loss': 3.7409, 'epoch': 0.32} {'Total_loss': 3.92, 'Contrastive_loss': 0.1853, 'CrossEntropy_loss': 3.7348, 'epoch': 0.33} {'Total_loss': 3.9338, 'Contrastive_loss': 0.1869, 'CrossEntropy_loss': 3.7469, 'epoch': 0.34} {'Total_loss': 3.8306, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 3.6459, 'epoch': 0.35} {'Total_loss': 3.8668, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 3.6809, 'epoch': 0.36} {'Total_loss': 3.7904, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 3.6056, 'epoch': 0.37} {'Total_loss': 3.7859, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 3.5996, 'epoch': 0.38} {'Total_loss': 3.6532, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 3.4689, 'epoch': 0.39} {'Total_loss': 3.7444, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 3.5586, 'epoch': 0.4} {'eval_loss': 3.615442991256714, 'eval_bleu': 0.0, 'eval_runtime': 46.0847, 'eval_samples_per_second': 53.25, 'eval_steps_per_second': 2.669, 'epoch': 0.4} {'Total_loss': 3.6788, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 3.494, 'epoch': 0.41} {'Total_loss': 3.6822, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 3.4974, 'epoch': 0.42} {'Total_loss': 3.7083, 'Contrastive_loss': 0.1916, 'CrossEntropy_loss': 3.5166, 'epoch': 0.43} {'Total_loss': 3.5641, 'Contrastive_loss': 0.1901, 'CrossEntropy_loss': 3.374, 'epoch': 0.44} {'Total_loss': 3.6535, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 3.4693, 'epoch': 0.45} {'Total_loss': 3.6156, 'Contrastive_loss': 0.1945, 'CrossEntropy_loss': 3.4211, 'epoch': 0.46} {'Total_loss': 3.5391, 'Contrastive_loss': 0.1865, 'CrossEntropy_loss': 3.3526, 'epoch': 0.47} {'Total_loss': 3.4827, 'Contrastive_loss': 0.1868, 'CrossEntropy_loss': 3.2959, 'epoch': 0.48} {'Total_loss': 3.5448, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 3.3589, 'epoch': 0.49} {'Total_loss': 3.4508, 'Contrastive_loss': 0.1853, 'CrossEntropy_loss': 3.2656, 'epoch': 0.5} {'Total_loss': 3.5725, 'Contrastive_loss': 0.1882, 'CrossEntropy_loss': 3.3844, 'epoch': 0.51} {'Total_loss': 3.3968, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 3.2114, 'epoch': 0.52} {'Total_loss': 3.3959, 'Contrastive_loss': 0.1895, 'CrossEntropy_loss': 3.2064, 'epoch': 0.53} {'Total_loss': 3.3771, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 3.1921, 'epoch': 0.54} {'Total_loss': 3.3779, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 3.192, 'epoch': 0.55} {'Total_loss': 3.396, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 3.2096, 'epoch': 0.56} {'Total_loss': 3.3529, 'Contrastive_loss': 0.1914, 'CrossEntropy_loss': 3.1615, 'epoch': 0.57} {'Total_loss': 3.3732, 'Contrastive_loss': 0.1893, 'CrossEntropy_loss': 3.1839, 'epoch': 0.58} {'Total_loss': 3.318, 'Contrastive_loss': 0.1887, 'CrossEntropy_loss': 3.1293, 'epoch': 0.59} {'Total_loss': 3.3289, 'Contrastive_loss': 0.1882, 'CrossEntropy_loss': 3.1407, 'epoch': 0.6} {'eval_loss': 3.2343878746032715, 'eval_bleu': 0.0, 'eval_runtime': 47.9308, 'eval_samples_per_second': 51.199, 'eval_steps_per_second': 2.566, 'epoch': 0.6} {'Total_loss': 3.322, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 3.1371, 'epoch': 0.61} {'Total_loss': 3.3367, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 3.1529, 'epoch': 0.62} {'Total_loss': 3.3023, 'Contrastive_loss': 0.1886, 'CrossEntropy_loss': 3.1137, 'epoch': 0.63} {'Total_loss': 3.2682, 'Contrastive_loss': 0.1983, 'CrossEntropy_loss': 3.0699, 'epoch': 0.64} {'Total_loss': 3.297, 'Contrastive_loss': 0.1959, 'CrossEntropy_loss': 3.1011, 'epoch': 0.65} {'Total_loss': 3.2783, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 3.0918, 'epoch': 0.66} {'Total_loss': 3.2495, 'Contrastive_loss': 0.1933, 'CrossEntropy_loss': 3.0562, 'epoch': 0.67} {'Total_loss': 3.302, 'Contrastive_loss': 0.1954, 'CrossEntropy_loss': 3.1067, 'epoch': 0.68} {'Total_loss': 3.2392, 'Contrastive_loss': 0.1911, 'CrossEntropy_loss': 3.0481, 'epoch': 0.69} {'Total_loss': 3.2368, 'Contrastive_loss': 0.1925, 'CrossEntropy_loss': 3.0444, 'epoch': 0.7} {'Total_loss': 3.292, 'Contrastive_loss': 0.1938, 'CrossEntropy_loss': 3.0983, 'epoch': 0.71} {'Total_loss': 3.2906, 'Contrastive_loss': 0.1937, 'CrossEntropy_loss': 3.0969, 'epoch': 0.72} {'Total_loss': 3.2616, 'Contrastive_loss': 0.2115, 'CrossEntropy_loss': 3.0501, 'epoch': 0.73} {'Total_loss': 3.2079, 'Contrastive_loss': 0.1905, 'CrossEntropy_loss': 3.0174, 'epoch': 0.74} {'Total_loss': 3.2165, 'Contrastive_loss': 0.1879, 'CrossEntropy_loss': 3.0286, 'epoch': 0.75} {'Total_loss': 3.2289, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 3.0423, 'epoch': 0.76} {'Total_loss': 3.1948, 'Contrastive_loss': 0.1926, 'CrossEntropy_loss': 3.0022, 'epoch': 0.77} {'Total_loss': 3.2068, 'Contrastive_loss': 0.1945, 'CrossEntropy_loss': 3.0123, 'epoch': 0.78} {'Total_loss': 3.1585, 'Contrastive_loss': 0.1938, 'CrossEntropy_loss': 2.9647, 'epoch': 0.79} {'Total_loss': 3.1925, 'Contrastive_loss': 0.2016, 'CrossEntropy_loss': 2.9909, 'epoch': 0.8} {'eval_loss': 3.068711757659912, 'eval_bleu': 0.0, 'eval_runtime': 47.8086, 'eval_samples_per_second': 51.33, 'eval_steps_per_second': 2.573, 'epoch': 0.8} {'Total_loss': 3.261, 'Contrastive_loss': 0.2016, 'CrossEntropy_loss': 3.0594, 'epoch': 0.81} {'Total_loss': 3.2025, 'Contrastive_loss': 0.228, 'CrossEntropy_loss': 2.9745, 'epoch': 0.82} {'Total_loss': 3.2189, 'Contrastive_loss': 0.2059, 'CrossEntropy_loss': 3.0129, 'epoch': 0.83} {'Total_loss': 3.2093, 'Contrastive_loss': 0.1929, 'CrossEntropy_loss': 3.0164, 'epoch': 0.84} {'Total_loss': 3.1422, 'Contrastive_loss': 0.2061, 'CrossEntropy_loss': 2.9362, 'epoch': 0.85} {'Total_loss': 3.108, 'Contrastive_loss': 0.2098, 'CrossEntropy_loss': 2.8982, 'epoch': 0.86} {'Total_loss': 3.1664, 'Contrastive_loss': 0.2013, 'CrossEntropy_loss': 2.9651, 'epoch': 0.87} {'Total_loss': 3.1319, 'Contrastive_loss': 0.1975, 'CrossEntropy_loss': 2.9343, 'epoch': 0.88} {'Total_loss': 3.1262, 'Contrastive_loss': 0.236, 'CrossEntropy_loss': 2.8901, 'epoch': 0.89} {'Total_loss': 3.1095, 'Contrastive_loss': 0.1889, 'CrossEntropy_loss': 2.9206, 'epoch': 0.89} {'Total_loss': 3.0944, 'Contrastive_loss': 0.1951, 'CrossEntropy_loss': 2.8994, 'epoch': 0.9} {'Total_loss': 3.1704, 'Contrastive_loss': 0.2005, 'CrossEntropy_loss': 2.9699, 'epoch': 0.91} {'Total_loss': 3.0742, 'Contrastive_loss': 0.1952, 'CrossEntropy_loss': 2.879, 'epoch': 0.92} {'Total_loss': 3.0851, 'Contrastive_loss': 0.1874, 'CrossEntropy_loss': 2.8977, 'epoch': 0.93} {'Total_loss': 3.0776, 'Contrastive_loss': 0.2212, 'CrossEntropy_loss': 2.8564, 'epoch': 0.94} {'Total_loss': 3.1153, 'Contrastive_loss': 0.2119, 'CrossEntropy_loss': 2.9034, 'epoch': 0.95} {'Total_loss': 3.0434, 'Contrastive_loss': 0.2024, 'CrossEntropy_loss': 2.841, 'epoch': 0.96} {'Total_loss': 3.0355, 'Contrastive_loss': 0.1901, 'CrossEntropy_loss': 2.8454, 'epoch': 0.97} {'Total_loss': 3.0321, 'Contrastive_loss': 0.2079, 'CrossEntropy_loss': 2.8242, 'epoch': 0.98} {'Total_loss': 3.0528, 'Contrastive_loss': 0.1902, 'CrossEntropy_loss': 2.8626, 'epoch': 0.99} {'eval_loss': 2.9136242866516113, 'eval_bleu': 0.0, 'eval_runtime': 47.6533, 'eval_samples_per_second': 51.497, 'eval_steps_per_second': 2.581, 'epoch': 0.99} {'Total_loss': 3.0288, 'Contrastive_loss': 0.1874, 'CrossEntropy_loss': 2.8414, 'epoch': 1.0} {'Total_loss': 2.5967, 'Contrastive_loss': 0.191, 'CrossEntropy_loss': 2.4057, 'epoch': 1.01} {'Total_loss': 2.6331, 'Contrastive_loss': 0.1892, 'CrossEntropy_loss': 2.4439, 'epoch': 1.02} {'Total_loss': 2.6449, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 2.4599, 'epoch': 1.03} {'Total_loss': 2.6576, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.4736, 'epoch': 1.04} {'Total_loss': 2.6048, 'Contrastive_loss': 0.1931, 'CrossEntropy_loss': 2.4117, 'epoch': 1.05} {'Total_loss': 2.6301, 'Contrastive_loss': 0.2357, 'CrossEntropy_loss': 2.3944, 'epoch': 1.06} {'Total_loss': 2.6623, 'Contrastive_loss': 0.2235, 'CrossEntropy_loss': 2.4387, 'epoch': 1.07} {'Total_loss': 2.6582, 'Contrastive_loss': 0.2133, 'CrossEntropy_loss': 2.4449, 'epoch': 1.08} {'Total_loss': 2.6618, 'Contrastive_loss': 0.1944, 'CrossEntropy_loss': 2.4674, 'epoch': 1.09} {'Total_loss': 2.5991, 'Contrastive_loss': 0.2005, 'CrossEntropy_loss': 2.3986, 'epoch': 1.1} {'Total_loss': 2.6137, 'Contrastive_loss': 0.1887, 'CrossEntropy_loss': 2.425, 'epoch': 1.11} {'Total_loss': 2.6139, 'Contrastive_loss': 0.2064, 'CrossEntropy_loss': 2.4075, 'epoch': 1.12} {'Total_loss': 2.6443, 'Contrastive_loss': 0.2167, 'CrossEntropy_loss': 2.4276, 'epoch': 1.13} {'Total_loss': 2.625, 'Contrastive_loss': 0.2083, 'CrossEntropy_loss': 2.4166, 'epoch': 1.14} {'Total_loss': 2.6687, 'Contrastive_loss': 0.2443, 'CrossEntropy_loss': 2.4244, 'epoch': 1.15} {'Total_loss': 2.6229, 'Contrastive_loss': 0.2321, 'CrossEntropy_loss': 2.3909, 'epoch': 1.16} {'Total_loss': 2.6232, 'Contrastive_loss': 0.2062, 'CrossEntropy_loss': 2.417, 'epoch': 1.17} {'Total_loss': 2.597, 'Contrastive_loss': 0.2003, 'CrossEntropy_loss': 2.3968, 'epoch': 1.18} {'Total_loss': 2.5784, 'Contrastive_loss': 0.1919, 'CrossEntropy_loss': 2.3865, 'epoch': 1.19} {'eval_loss': 2.8263299465179443, 'eval_bleu': 0.0, 'eval_runtime': 47.5913, 'eval_samples_per_second': 51.564, 'eval_steps_per_second': 2.585, 'epoch': 1.19} {'Total_loss': 2.675, 'Contrastive_loss': 0.2484, 'CrossEntropy_loss': 2.4266, 'epoch': 1.2} {'Total_loss': 2.6234, 'Contrastive_loss': 0.2451, 'CrossEntropy_loss': 2.3783, 'epoch': 1.21} {'Total_loss': 2.5849, 'Contrastive_loss': 0.1961, 'CrossEntropy_loss': 2.3888, 'epoch': 1.22} {'Total_loss': 2.6179, 'Contrastive_loss': 0.2097, 'CrossEntropy_loss': 2.4082, 'epoch': 1.23} {'Total_loss': 2.6146, 'Contrastive_loss': 0.2308, 'CrossEntropy_loss': 2.3839, 'epoch': 1.24} {'Total_loss': 2.5999, 'Contrastive_loss': 0.1946, 'CrossEntropy_loss': 2.4054, 'epoch': 1.25} {'Total_loss': 2.5606, 'Contrastive_loss': 0.1895, 'CrossEntropy_loss': 2.3711, 'epoch': 1.26} {'Total_loss': 2.5558, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.3696, 'epoch': 1.27} {'Total_loss': 2.5838, 'Contrastive_loss': 0.2322, 'CrossEntropy_loss': 2.3516, 'epoch': 1.28} {'Total_loss': 2.5681, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 2.3815, 'epoch': 1.29} {'Total_loss': 2.6259, 'Contrastive_loss': 0.2376, 'CrossEntropy_loss': 2.3884, 'epoch': 1.3} {'Total_loss': 2.5884, 'Contrastive_loss': 0.212, 'CrossEntropy_loss': 2.3764, 'epoch': 1.31} {'Total_loss': 2.5409, 'Contrastive_loss': 0.1975, 'CrossEntropy_loss': 2.3434, 'epoch': 1.32} {'Total_loss': 2.5743, 'Contrastive_loss': 0.1885, 'CrossEntropy_loss': 2.3858, 'epoch': 1.33} {'Total_loss': 2.5428, 'Contrastive_loss': 0.1902, 'CrossEntropy_loss': 2.3526, 'epoch': 1.34} {'Total_loss': 2.5822, 'Contrastive_loss': 0.1936, 'CrossEntropy_loss': 2.3886, 'epoch': 1.35} {'Total_loss': 2.5383, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.3528, 'epoch': 1.36} {'Total_loss': 2.5074, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 2.321, 'epoch': 1.37} {'Total_loss': 2.5501, 'Contrastive_loss': 0.1904, 'CrossEntropy_loss': 2.3597, 'epoch': 1.38} {'Total_loss': 2.5195, 'Contrastive_loss': 0.1882, 'CrossEntropy_loss': 2.3314, 'epoch': 1.39} {'eval_loss': 2.728792190551758, 'eval_bleu': 0.0, 'eval_runtime': 47.5735, 'eval_samples_per_second': 51.583, 'eval_steps_per_second': 2.585, 'epoch': 1.39} {'Total_loss': 2.5535, 'Contrastive_loss': 0.2074, 'CrossEntropy_loss': 2.3461, 'epoch': 1.4} {'Total_loss': 2.53, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.3438, 'epoch': 1.41} {'Total_loss': 2.5256, 'Contrastive_loss': 0.188, 'CrossEntropy_loss': 2.3377, 'epoch': 1.42} {'Total_loss': 2.5134, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.3278, 'epoch': 1.43} {'Total_loss': 2.4979, 'Contrastive_loss': 0.1867, 'CrossEntropy_loss': 2.3112, 'epoch': 1.44} {'Total_loss': 2.5007, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.3169, 'epoch': 1.45} {'Total_loss': 2.5313, 'Contrastive_loss': 0.1964, 'CrossEntropy_loss': 2.335, 'epoch': 1.46} {'Total_loss': 2.529, 'Contrastive_loss': 0.2008, 'CrossEntropy_loss': 2.3282, 'epoch': 1.47} {'Total_loss': 2.4755, 'Contrastive_loss': 0.1842, 'CrossEntropy_loss': 2.2912, 'epoch': 1.48} {'Total_loss': 2.4807, 'Contrastive_loss': 0.1892, 'CrossEntropy_loss': 2.2915, 'epoch': 1.49} {'Total_loss': 2.5163, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.3325, 'epoch': 1.5} {'Total_loss': 2.4685, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.2856, 'epoch': 1.51} {'Total_loss': 2.4948, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.3116, 'epoch': 1.52} {'Total_loss': 2.4745, 'Contrastive_loss': 0.1915, 'CrossEntropy_loss': 2.2831, 'epoch': 1.53} {'Total_loss': 2.489, 'Contrastive_loss': 0.1953, 'CrossEntropy_loss': 2.2937, 'epoch': 1.54} {'Total_loss': 2.4989, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 2.3132, 'epoch': 1.55} {'Total_loss': 2.4978, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.3142, 'epoch': 1.56} {'Total_loss': 2.5127, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.3298, 'epoch': 1.57} {'Total_loss': 2.5181, 'Contrastive_loss': 0.1897, 'CrossEntropy_loss': 2.3284, 'epoch': 1.58} {'Total_loss': 2.4745, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.289, 'epoch': 1.59} {'eval_loss': 2.653449535369873, 'eval_bleu': 0.0, 'eval_runtime': 47.5117, 'eval_samples_per_second': 51.65, 'eval_steps_per_second': 2.589, 'epoch': 1.59} {'Total_loss': 2.4969, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.3144, 'epoch': 1.6} {'Total_loss': 2.4451, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.2604, 'epoch': 1.61} {'Total_loss': 2.4944, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.3092, 'epoch': 1.62} {'Total_loss': 2.4933, 'Contrastive_loss': 0.1874, 'CrossEntropy_loss': 2.3058, 'epoch': 1.63} {'Total_loss': 2.4844, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.3004, 'epoch': 1.64} {'Total_loss': 2.4124, 'Contrastive_loss': 0.1877, 'CrossEntropy_loss': 2.2247, 'epoch': 1.65} {'Total_loss': 2.4526, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.268, 'epoch': 1.66} {'Total_loss': 2.4455, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.2614, 'epoch': 1.67} {'Total_loss': 2.4624, 'Contrastive_loss': 0.186, 'CrossEntropy_loss': 2.2764, 'epoch': 1.68} {'Total_loss': 2.4402, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.2568, 'epoch': 1.69} {'Total_loss': 2.4699, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 2.2845, 'epoch': 1.7} {'Total_loss': 2.4525, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.2691, 'epoch': 1.71} {'Total_loss': 2.4427, 'Contrastive_loss': 0.1877, 'CrossEntropy_loss': 2.2551, 'epoch': 1.72} {'Total_loss': 2.4745, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.2916, 'epoch': 1.73} {'Total_loss': 2.491, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.3083, 'epoch': 1.74} {'Total_loss': 2.4476, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.2645, 'epoch': 1.75} {'Total_loss': 2.4509, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.2679, 'epoch': 1.76} {'Total_loss': 2.4308, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.248, 'epoch': 1.77} {'Total_loss': 2.4417, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.2573, 'epoch': 1.78} {'Total_loss': 2.4443, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.2598, 'epoch': 1.79} {'eval_loss': 2.597923755645752, 'eval_bleu': 49.80716382320734, 'eval_runtime': 207.6051, 'eval_samples_per_second': 11.821, 'eval_steps_per_second': 0.592, 'epoch': 1.79} {'Total_loss': 2.4395, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 2.2532, 'epoch': 1.8} {'Total_loss': 2.4053, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.2198, 'epoch': 1.81} {'Total_loss': 2.4413, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.2541, 'epoch': 1.82} {'Total_loss': 2.4284, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 2.2435, 'epoch': 1.83} {'Total_loss': 2.4404, 'Contrastive_loss': 0.185, 'CrossEntropy_loss': 2.2555, 'epoch': 1.84} {'Total_loss': 2.4281, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.2453, 'epoch': 1.85} {'Total_loss': 2.4562, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.2721, 'epoch': 1.86} {'Total_loss': 2.4176, 'Contrastive_loss': 0.1878, 'CrossEntropy_loss': 2.2298, 'epoch': 1.87} {'Total_loss': 2.4227, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.2382, 'epoch': 1.88} {'Total_loss': 2.4036, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.2203, 'epoch': 1.89} {'Total_loss': 2.4288, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 2.2445, 'epoch': 1.9} {'Total_loss': 2.4106, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.2272, 'epoch': 1.91} {'Total_loss': 2.4278, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.2432, 'epoch': 1.92} {'Total_loss': 2.4145, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.2293, 'epoch': 1.93} {'Total_loss': 2.4163, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.229, 'epoch': 1.94} {'Total_loss': 2.4153, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.2332, 'epoch': 1.95} {'Total_loss': 2.3972, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.214, 'epoch': 1.96} {'Total_loss': 2.3942, 'Contrastive_loss': 0.188, 'CrossEntropy_loss': 2.2062, 'epoch': 1.97} {'Total_loss': 2.3891, 'Contrastive_loss': 0.1853, 'CrossEntropy_loss': 2.2038, 'epoch': 1.98} {'Total_loss': 2.3855, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.2034, 'epoch': 1.99} {'eval_loss': 2.5653457641601562, 'eval_bleu': 51.134713477013314, 'eval_runtime': 208.6028, 'eval_samples_per_second': 11.764, 'eval_steps_per_second': 0.59, 'epoch': 1.99} {'Total_loss': 2.3773, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.195, 'epoch': 2.0} {'Total_loss': 2.3427, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1598, 'epoch': 2.01} {'Total_loss': 2.3381, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1547, 'epoch': 2.02} {'Total_loss': 2.38, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1962, 'epoch': 2.03} {'Total_loss': 2.3861, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.2026, 'epoch': 2.04} {'Total_loss': 2.307, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1238, 'epoch': 2.05} {'Total_loss': 2.3741, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1912, 'epoch': 2.06} {'Total_loss': 2.3524, 'Contrastive_loss': 0.1897, 'CrossEntropy_loss': 2.1627, 'epoch': 2.07} {'Total_loss': 2.3685, 'Contrastive_loss': 0.1862, 'CrossEntropy_loss': 2.1823, 'epoch': 2.08} {'Total_loss': 2.3893, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.2063, 'epoch': 2.09} {'Total_loss': 2.3638, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1817, 'epoch': 2.1} {'Total_loss': 2.4216, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.2372, 'epoch': 2.11} {'Total_loss': 2.3677, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 2.1823, 'epoch': 2.12} {'Total_loss': 2.3411, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.157, 'epoch': 2.13} {'Total_loss': 2.3643, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1822, 'epoch': 2.14} {'Total_loss': 2.3706, 'Contrastive_loss': 0.1893, 'CrossEntropy_loss': 2.1814, 'epoch': 2.15} {'Total_loss': 2.3648, 'Contrastive_loss': 0.1862, 'CrossEntropy_loss': 2.1786, 'epoch': 2.16} {'Total_loss': 2.3758, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.1902, 'epoch': 2.17} {'Total_loss': 2.372, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1885, 'epoch': 2.18} {'Total_loss': 2.3481, 'Contrastive_loss': 0.1844, 'CrossEntropy_loss': 2.1637, 'epoch': 2.19} {'eval_loss': 2.535236120223999, 'eval_bleu': 50.85850127149657, 'eval_runtime': 202.665, 'eval_samples_per_second': 12.109, 'eval_steps_per_second': 0.607, 'epoch': 2.19} {'Total_loss': 2.3967, 'Contrastive_loss': 0.1915, 'CrossEntropy_loss': 2.2052, 'epoch': 2.2} {'Total_loss': 2.3312, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.1473, 'epoch': 2.21} {'Total_loss': 2.3456, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1627, 'epoch': 2.22} {'Total_loss': 2.367, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.1848, 'epoch': 2.23} {'Total_loss': 2.3682, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.1859, 'epoch': 2.24} {'Total_loss': 2.3642, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1822, 'epoch': 2.25} {'Total_loss': 2.3502, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1672, 'epoch': 2.26} {'Total_loss': 2.3417, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1588, 'epoch': 2.27} {'Total_loss': 2.3553, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 2.1654, 'epoch': 2.28} {'Total_loss': 2.3813, 'Contrastive_loss': 0.1935, 'CrossEntropy_loss': 2.1879, 'epoch': 2.29} {'Total_loss': 2.3538, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.1687, 'epoch': 2.3} {'Total_loss': 2.3892, 'Contrastive_loss': 0.1918, 'CrossEntropy_loss': 2.1973, 'epoch': 2.31} {'Total_loss': 2.3378, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.154, 'epoch': 2.32} {'Total_loss': 2.3711, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.185, 'epoch': 2.33} {'Total_loss': 2.372, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.1873, 'epoch': 2.34} {'Total_loss': 2.3446, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.159, 'epoch': 2.35} {'Total_loss': 2.3505, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.1665, 'epoch': 2.36} {'Total_loss': 2.3465, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1636, 'epoch': 2.37} {'Total_loss': 2.3911, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.207, 'epoch': 2.38} {'Total_loss': 2.3696, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 2.1837, 'epoch': 2.39} {'eval_loss': 2.5176327228546143, 'eval_bleu': 50.89111839209788, 'eval_runtime': 193.6496, 'eval_samples_per_second': 12.672, 'eval_steps_per_second': 0.635, 'epoch': 2.39} {'Total_loss': 2.3632, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.1793, 'epoch': 2.4} {'Total_loss': 2.3402, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1577, 'epoch': 2.41} {'Total_loss': 2.3848, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.2028, 'epoch': 2.42} {'Total_loss': 2.3302, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1471, 'epoch': 2.43} {'Total_loss': 2.3447, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1622, 'epoch': 2.44} {'Total_loss': 2.3574, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 2.1716, 'epoch': 2.45} {'Total_loss': 2.3612, 'Contrastive_loss': 0.191, 'CrossEntropy_loss': 2.1702, 'epoch': 2.46} {'Total_loss': 2.3565, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.171, 'epoch': 2.47} {'Total_loss': 2.3586, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.176, 'epoch': 2.48} {'Total_loss': 2.3467, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.1645, 'epoch': 2.49} {'Total_loss': 2.3531, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1701, 'epoch': 2.5} {'Total_loss': 2.3457, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.163, 'epoch': 2.51} {'Total_loss': 2.332, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 2.1454, 'epoch': 2.52} {'Total_loss': 2.3799, 'Contrastive_loss': 0.1831, 'CrossEntropy_loss': 2.1968, 'epoch': 2.53} {'Total_loss': 2.3343, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.1517, 'epoch': 2.54} {'Total_loss': 2.3344, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.1511, 'epoch': 2.55} {'Total_loss': 2.3462, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1628, 'epoch': 2.56} {'Total_loss': 2.35, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.1627, 'epoch': 2.57} {'Total_loss': 2.3536, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.17, 'epoch': 2.58} {'Total_loss': 2.3502, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.168, 'epoch': 2.59} {'eval_loss': 2.4858028888702393, 'eval_bleu': 51.683593977892926, 'eval_runtime': 184.2891, 'eval_samples_per_second': 13.316, 'eval_steps_per_second': 0.667, 'epoch': 2.59} {'Total_loss': 2.341, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.1577, 'epoch': 2.6} {'Total_loss': 2.3287, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.1441, 'epoch': 2.61} {'Total_loss': 2.3235, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.139, 'epoch': 2.62} {'Total_loss': 2.3407, 'Contrastive_loss': 0.1844, 'CrossEntropy_loss': 2.1563, 'epoch': 2.63} {'Total_loss': 2.3196, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1362, 'epoch': 2.64} {'Total_loss': 2.3087, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1249, 'epoch': 2.65} {'Total_loss': 2.3618, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1793, 'epoch': 2.66} {'Total_loss': 2.3399, 'Contrastive_loss': 0.192, 'CrossEntropy_loss': 2.1479, 'epoch': 2.67} {'Total_loss': 2.3522, 'Contrastive_loss': 0.1851, 'CrossEntropy_loss': 2.1672, 'epoch': 2.68} {'Total_loss': 2.3514, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.1678, 'epoch': 2.68} {'Total_loss': 2.3138, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.1311, 'epoch': 2.69} {'Total_loss': 2.3503, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 2.1655, 'epoch': 2.7} {'Total_loss': 2.3433, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1604, 'epoch': 2.71} {'Total_loss': 2.3396, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.1568, 'epoch': 2.72} {'Total_loss': 2.3325, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.1503, 'epoch': 2.73} {'Total_loss': 2.3535, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 2.1676, 'epoch': 2.74} {'Total_loss': 2.3536, 'Contrastive_loss': 0.1873, 'CrossEntropy_loss': 2.1663, 'epoch': 2.75} {'Total_loss': 2.3535, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.17, 'epoch': 2.76} {'Total_loss': 2.3375, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.1556, 'epoch': 2.77} {'Total_loss': 2.3496, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.1673, 'epoch': 2.78} {'eval_loss': 2.46993350982666, 'eval_bleu': 52.00898377771929, 'eval_runtime': 178.0425, 'eval_samples_per_second': 13.783, 'eval_steps_per_second': 0.691, 'epoch': 2.78} {'Total_loss': 2.3248, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1424, 'epoch': 2.79} {'Total_loss': 2.3109, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.1282, 'epoch': 2.8} {'Total_loss': 2.3336, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1506, 'epoch': 2.81} {'Total_loss': 2.3123, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1291, 'epoch': 2.82} {'Total_loss': 2.3297, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.1471, 'epoch': 2.83} {'Total_loss': 2.368, 'Contrastive_loss': 0.1883, 'CrossEntropy_loss': 2.1796, 'epoch': 2.84} {'Total_loss': 2.345, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1612, 'epoch': 2.85} {'Total_loss': 2.3744, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.1888, 'epoch': 2.86} {'Total_loss': 2.3441, 'Contrastive_loss': 0.1831, 'CrossEntropy_loss': 2.161, 'epoch': 2.87} {'Total_loss': 2.3636, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.1799, 'epoch': 2.88} {'Total_loss': 2.289, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1061, 'epoch': 2.89} {'Total_loss': 2.311, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.1249, 'epoch': 2.9} {'Total_loss': 2.3154, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.132, 'epoch': 2.91} {'Total_loss': 2.3413, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 2.1514, 'epoch': 2.92} {'Total_loss': 2.3332, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.1496, 'epoch': 2.93} {'Total_loss': 2.3287, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1449, 'epoch': 2.94} {'Total_loss': 2.3343, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1509, 'epoch': 2.95} {'Total_loss': 2.3287, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.1426, 'epoch': 2.96} {'Total_loss': 2.3261, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1436, 'epoch': 2.97} {'Total_loss': 2.3311, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1491, 'epoch': 2.98} {'eval_loss': 2.453214645385742, 'eval_bleu': 52.202791946059484, 'eval_runtime': 181.2982, 'eval_samples_per_second': 13.536, 'eval_steps_per_second': 0.678, 'epoch': 2.98} {'Total_loss': 2.3037, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.1201, 'epoch': 2.99} {'Total_loss': 2.3179, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1349, 'epoch': 3.0} {'Total_loss': 2.271, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0885, 'epoch': 3.01} {'Total_loss': 2.2661, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0834, 'epoch': 3.02} {'Total_loss': 2.2771, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0948, 'epoch': 3.03} {'Total_loss': 2.278, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.0939, 'epoch': 3.04} {'Total_loss': 2.2858, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.1018, 'epoch': 3.05} {'Total_loss': 2.2392, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.0563, 'epoch': 3.06} {'Total_loss': 2.2729, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.0891, 'epoch': 3.07} {'Total_loss': 2.2633, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0805, 'epoch': 3.08} {'Total_loss': 2.3441, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1606, 'epoch': 3.09} {'Total_loss': 2.293, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1098, 'epoch': 3.1} {'Total_loss': 2.2852, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1028, 'epoch': 3.11} {'Total_loss': 2.2762, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0942, 'epoch': 3.12} {'Total_loss': 2.2746, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0924, 'epoch': 3.13} {'Total_loss': 2.3037, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.1197, 'epoch': 3.14} {'Total_loss': 2.2738, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.0893, 'epoch': 3.15} {'Total_loss': 2.2991, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1167, 'epoch': 3.16} {'Total_loss': 2.2825, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.0984, 'epoch': 3.17} {'Total_loss': 2.2789, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0967, 'epoch': 3.18} {'eval_loss': 2.441037654876709, 'eval_bleu': 52.57699361982004, 'eval_runtime': 188.0562, 'eval_samples_per_second': 13.049, 'eval_steps_per_second': 0.654, 'epoch': 3.18} {'Total_loss': 2.2922, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1092, 'epoch': 3.19} {'Total_loss': 2.2841, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1009, 'epoch': 3.2} {'Total_loss': 2.2943, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1122, 'epoch': 3.21} {'Total_loss': 2.2784, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.0951, 'epoch': 3.22} {'Total_loss': 2.2971, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.1138, 'epoch': 3.23} {'Total_loss': 2.3003, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 2.116, 'epoch': 3.24} {'Total_loss': 2.3274, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1441, 'epoch': 3.25} {'Total_loss': 2.3234, 'Contrastive_loss': 0.1857, 'CrossEntropy_loss': 2.1377, 'epoch': 3.26} {'Total_loss': 2.2684, 'Contrastive_loss': 0.1884, 'CrossEntropy_loss': 2.08, 'epoch': 3.27} {'Total_loss': 2.263, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.0784, 'epoch': 3.28} {'Total_loss': 2.2942, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.112, 'epoch': 3.29} {'Total_loss': 2.268, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.0843, 'epoch': 3.3} {'Total_loss': 2.2813, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.0976, 'epoch': 3.31} {'Total_loss': 2.3315, 'Contrastive_loss': 0.1938, 'CrossEntropy_loss': 2.1377, 'epoch': 3.32} {'Total_loss': 2.2906, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 2.1051, 'epoch': 3.33} {'Total_loss': 2.2727, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.0899, 'epoch': 3.34} {'Total_loss': 2.3197, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 2.1349, 'epoch': 3.35} {'Total_loss': 2.257, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.0737, 'epoch': 3.36} {'Total_loss': 2.2576, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0745, 'epoch': 3.37} {'Total_loss': 2.2988, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1153, 'epoch': 3.38} {'eval_loss': 2.430711269378662, 'eval_bleu': 52.58615360161306, 'eval_runtime': 177.0606, 'eval_samples_per_second': 13.86, 'eval_steps_per_second': 0.695, 'epoch': 3.38} {'Total_loss': 2.2812, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0989, 'epoch': 3.39} {'Total_loss': 2.289, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1069, 'epoch': 3.4} {'Total_loss': 2.3071, 'Contrastive_loss': 0.192, 'CrossEntropy_loss': 2.1151, 'epoch': 3.41} {'Total_loss': 2.2703, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0876, 'epoch': 3.42} {'Total_loss': 2.2586, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0758, 'epoch': 3.43} {'Total_loss': 2.2625, 'Contrastive_loss': 0.1871, 'CrossEntropy_loss': 2.0754, 'epoch': 3.44} {'Total_loss': 2.3089, 'Contrastive_loss': 0.186, 'CrossEntropy_loss': 2.1229, 'epoch': 3.45} {'Total_loss': 2.2925, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1089, 'epoch': 3.46} {'Total_loss': 2.2895, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.107, 'epoch': 3.47} {'Total_loss': 2.315, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.133, 'epoch': 3.48} {'Total_loss': 2.244, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0618, 'epoch': 3.49} {'Total_loss': 2.2738, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0915, 'epoch': 3.5} {'Total_loss': 2.2914, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1089, 'epoch': 3.51} {'Total_loss': 2.2642, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.081, 'epoch': 3.52} {'Total_loss': 2.3208, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1388, 'epoch': 3.53} {'Total_loss': 2.2664, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0841, 'epoch': 3.54} {'Total_loss': 2.2779, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0957, 'epoch': 3.55} {'Total_loss': 2.2633, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0813, 'epoch': 3.56} {'Total_loss': 2.3217, 'Contrastive_loss': 0.1842, 'CrossEntropy_loss': 2.1375, 'epoch': 3.57} {'Total_loss': 2.2967, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1143, 'epoch': 3.58} {'eval_loss': 2.413973569869995, 'eval_bleu': 53.57574736311562, 'eval_runtime': 191.0835, 'eval_samples_per_second': 12.843, 'eval_steps_per_second': 0.644, 'epoch': 3.58} {'Total_loss': 2.3146, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1316, 'epoch': 3.59} {'Total_loss': 2.2689, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0857, 'epoch': 3.6} {'Total_loss': 2.2762, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0943, 'epoch': 3.61} {'Total_loss': 2.2938, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.1119, 'epoch': 3.62} {'Total_loss': 2.3194, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1374, 'epoch': 3.63} {'Total_loss': 2.2412, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0593, 'epoch': 3.64} {'Total_loss': 2.269, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0869, 'epoch': 3.65} {'Total_loss': 2.2955, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1134, 'epoch': 3.66} {'Total_loss': 2.2729, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.091, 'epoch': 3.67} {'Total_loss': 2.2616, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0795, 'epoch': 3.68} {'Total_loss': 2.27, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0871, 'epoch': 3.69} {'Total_loss': 2.2826, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1006, 'epoch': 3.7} {'Total_loss': 2.2638, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0817, 'epoch': 3.71} {'Total_loss': 2.293, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.1111, 'epoch': 3.72} {'Total_loss': 2.2435, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0616, 'epoch': 3.73} {'Total_loss': 2.2528, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0706, 'epoch': 3.74} {'Total_loss': 2.2691, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0868, 'epoch': 3.75} {'Total_loss': 2.2577, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0756, 'epoch': 3.76} {'Total_loss': 2.2528, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.0681, 'epoch': 3.77} {'Total_loss': 2.3004, 'Contrastive_loss': 0.1857, 'CrossEntropy_loss': 2.1148, 'epoch': 3.78} {'eval_loss': 2.408651113510132, 'eval_bleu': 52.29365898583055, 'eval_runtime': 176.3842, 'eval_samples_per_second': 13.913, 'eval_steps_per_second': 0.697, 'epoch': 3.78} {'Total_loss': 2.2724, 'Contrastive_loss': 0.189, 'CrossEntropy_loss': 2.0833, 'epoch': 3.79} {'Total_loss': 2.2574, 'Contrastive_loss': 0.1884, 'CrossEntropy_loss': 2.069, 'epoch': 3.8} {'Total_loss': 2.2729, 'Contrastive_loss': 0.1961, 'CrossEntropy_loss': 2.0768, 'epoch': 3.81} {'Total_loss': 2.2897, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.105, 'epoch': 3.82} {'Total_loss': 2.2641, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.0789, 'epoch': 3.83} {'Total_loss': 2.2805, 'Contrastive_loss': 0.186, 'CrossEntropy_loss': 2.0945, 'epoch': 3.84} {'Total_loss': 2.2932, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.1092, 'epoch': 3.85} {'Total_loss': 2.2909, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1083, 'epoch': 3.86} {'Total_loss': 2.2628, 'Contrastive_loss': 0.187, 'CrossEntropy_loss': 2.0757, 'epoch': 3.87} {'Total_loss': 2.2801, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.0956, 'epoch': 3.88} {'Total_loss': 2.2835, 'Contrastive_loss': 0.1881, 'CrossEntropy_loss': 2.0953, 'epoch': 3.89} {'Total_loss': 2.3079, 'Contrastive_loss': 0.1914, 'CrossEntropy_loss': 2.1164, 'epoch': 3.9} {'Total_loss': 2.3009, 'Contrastive_loss': 0.1877, 'CrossEntropy_loss': 2.1132, 'epoch': 3.91} {'Total_loss': 2.2958, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1123, 'epoch': 3.92} {'Total_loss': 2.2789, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.0937, 'epoch': 3.93} {'Total_loss': 2.2826, 'Contrastive_loss': 0.2041, 'CrossEntropy_loss': 2.0786, 'epoch': 3.94} {'Total_loss': 2.2745, 'Contrastive_loss': 0.1951, 'CrossEntropy_loss': 2.0794, 'epoch': 3.95} {'Total_loss': 2.2954, 'Contrastive_loss': 0.1876, 'CrossEntropy_loss': 2.1077, 'epoch': 3.96} {'Total_loss': 2.2917, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 2.1059, 'epoch': 3.97} {'Total_loss': 2.3064, 'Contrastive_loss': 0.1952, 'CrossEntropy_loss': 2.1112, 'epoch': 3.98} {'eval_loss': 2.4026389122009277, 'eval_bleu': 52.777080738563626, 'eval_runtime': 188.3134, 'eval_samples_per_second': 13.031, 'eval_steps_per_second': 0.653, 'epoch': 3.98} {'Total_loss': 2.2677, 'Contrastive_loss': 0.1903, 'CrossEntropy_loss': 2.0774, 'epoch': 3.99} {'Total_loss': 2.2872, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.0999, 'epoch': 4.0} {'Total_loss': 2.2791, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0968, 'epoch': 4.01} {'Total_loss': 2.2375, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0549, 'epoch': 4.02} {'Total_loss': 2.2759, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.0903, 'epoch': 4.03} {'Total_loss': 2.2553, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.0698, 'epoch': 4.04} {'Total_loss': 2.2298, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.0446, 'epoch': 4.05} {'Total_loss': 2.2814, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.0977, 'epoch': 4.06} {'Total_loss': 2.2491, 'Contrastive_loss': 0.1842, 'CrossEntropy_loss': 2.0649, 'epoch': 4.07} {'Total_loss': 2.2151, 'Contrastive_loss': 0.1867, 'CrossEntropy_loss': 2.0284, 'epoch': 4.08} {'Total_loss': 2.2506, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0684, 'epoch': 4.09} {'Total_loss': 2.2506, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0687, 'epoch': 4.1} {'Total_loss': 2.2559, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0738, 'epoch': 4.11} {'Total_loss': 2.243, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0603, 'epoch': 4.12} {'Total_loss': 2.2695, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.087, 'epoch': 4.13} {'Total_loss': 2.2509, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0688, 'epoch': 4.14} {'Total_loss': 2.2104, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0282, 'epoch': 4.15} {'Total_loss': 2.2471, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0653, 'epoch': 4.16} {'Total_loss': 2.2263, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0443, 'epoch': 4.17} {'Total_loss': 2.2518, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.0682, 'epoch': 4.18} {'eval_loss': 2.3921515941619873, 'eval_bleu': 53.35485095796362, 'eval_runtime': 189.8349, 'eval_samples_per_second': 12.927, 'eval_steps_per_second': 0.648, 'epoch': 4.18} {'Total_loss': 2.2335, 'Contrastive_loss': 0.1831, 'CrossEntropy_loss': 2.0503, 'epoch': 4.19} {'Total_loss': 2.2166, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.0342, 'epoch': 4.2} {'Total_loss': 2.2239, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0418, 'epoch': 4.21} {'Total_loss': 2.2572, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.075, 'epoch': 4.22} {'Total_loss': 2.2203, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.038, 'epoch': 4.23} {'Total_loss': 2.2432, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.0597, 'epoch': 4.24} {'Total_loss': 2.2335, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0508, 'epoch': 4.25} {'Total_loss': 2.2576, 'Contrastive_loss': 0.1868, 'CrossEntropy_loss': 2.0708, 'epoch': 4.26} {'Total_loss': 2.2499, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 2.064, 'epoch': 4.27} {'Total_loss': 2.2562, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.0727, 'epoch': 4.28} {'Total_loss': 2.2461, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0641, 'epoch': 4.29} {'Total_loss': 2.2782, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0963, 'epoch': 4.3} {'Total_loss': 2.2396, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0578, 'epoch': 4.31} {'Total_loss': 2.2235, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0403, 'epoch': 4.32} {'Total_loss': 2.2181, 'Contrastive_loss': 0.1886, 'CrossEntropy_loss': 2.0295, 'epoch': 4.33} {'Total_loss': 2.27, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.0867, 'epoch': 4.34} {'Total_loss': 2.2445, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0617, 'epoch': 4.35} {'Total_loss': 2.2624, 'Contrastive_loss': 0.1844, 'CrossEntropy_loss': 2.0779, 'epoch': 4.36} {'Total_loss': 2.2433, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.0588, 'epoch': 4.37} {'Total_loss': 2.2099, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0275, 'epoch': 4.38} {'eval_loss': 2.3844974040985107, 'eval_bleu': 54.22287598491524, 'eval_runtime': 198.6886, 'eval_samples_per_second': 12.351, 'eval_steps_per_second': 0.619, 'epoch': 4.38} {'Total_loss': 2.2639, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.082, 'epoch': 4.39} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0463, 'epoch': 4.4} {'Total_loss': 2.2324, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0502, 'epoch': 4.41} {'Total_loss': 2.2365, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0543, 'epoch': 4.42} {'Total_loss': 2.2404, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0581, 'epoch': 4.43} {'Total_loss': 2.2419, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.0595, 'epoch': 4.44} {'Total_loss': 2.2452, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0633, 'epoch': 4.45} {'Total_loss': 2.2395, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0571, 'epoch': 4.46} {'Total_loss': 2.2197, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.0362, 'epoch': 4.46} {'Total_loss': 2.2593, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0773, 'epoch': 4.47} {'Total_loss': 2.2175, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0351, 'epoch': 4.48} {'Total_loss': 2.2446, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0623, 'epoch': 4.49} {'Total_loss': 2.2521, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0699, 'epoch': 4.5} {'Total_loss': 2.2398, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0575, 'epoch': 4.51} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0463, 'epoch': 4.52} {'Total_loss': 2.2386, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0566, 'epoch': 4.53} {'Total_loss': 2.2279, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0455, 'epoch': 4.54} {'Total_loss': 2.2116, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0298, 'epoch': 4.55} {'Total_loss': 2.2292, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0474, 'epoch': 4.56} {'Total_loss': 2.2292, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0475, 'epoch': 4.57} {'eval_loss': 2.3810110092163086, 'eval_bleu': 53.01764696584839, 'eval_runtime': 198.5997, 'eval_samples_per_second': 12.357, 'eval_steps_per_second': 0.619, 'epoch': 4.57} {'Total_loss': 2.2307, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0487, 'epoch': 4.58} {'Total_loss': 2.2477, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0656, 'epoch': 4.59} {'Total_loss': 2.2529, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.071, 'epoch': 4.6} {'Total_loss': 2.223, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.041, 'epoch': 4.61} {'Total_loss': 2.2283, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.046, 'epoch': 4.62} {'Total_loss': 2.2502, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.067, 'epoch': 4.63} {'Total_loss': 2.2454, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0633, 'epoch': 4.64} {'Total_loss': 2.248, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0658, 'epoch': 4.65} {'Total_loss': 2.2404, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0581, 'epoch': 4.66} {'Total_loss': 2.2822, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.0997, 'epoch': 4.67} {'Total_loss': 2.2554, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0733, 'epoch': 4.68} {'Total_loss': 2.2478, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0657, 'epoch': 4.69} {'Total_loss': 2.2336, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.0495, 'epoch': 4.7} {'Total_loss': 2.2658, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0835, 'epoch': 4.71} {'Total_loss': 2.208, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.0251, 'epoch': 4.72} {'Total_loss': 2.2101, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0282, 'epoch': 4.73} {'Total_loss': 2.2359, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0527, 'epoch': 4.74} {'Total_loss': 2.2328, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0505, 'epoch': 4.75} {'Total_loss': 2.2259, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0438, 'epoch': 4.76} {'Total_loss': 2.2624, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0803, 'epoch': 4.77} {'eval_loss': 2.366802453994751, 'eval_bleu': 54.010005283325334, 'eval_runtime': 197.2867, 'eval_samples_per_second': 12.439, 'eval_steps_per_second': 0.623, 'epoch': 4.77} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0462, 'epoch': 4.78} {'Total_loss': 2.2649, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0829, 'epoch': 4.79} {'Total_loss': 2.2718, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.09, 'epoch': 4.8} {'Total_loss': 2.2471, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0653, 'epoch': 4.81} {'Total_loss': 2.2084, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0263, 'epoch': 4.82} {'Total_loss': 2.2395, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0576, 'epoch': 4.83} {'Total_loss': 2.2409, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0584, 'epoch': 4.84} {'Total_loss': 2.2473, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0655, 'epoch': 4.85} {'Total_loss': 2.2447, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0628, 'epoch': 4.86} {'Total_loss': 2.2414, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.059, 'epoch': 4.87} {'Total_loss': 2.2668, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.0838, 'epoch': 4.88} {'Total_loss': 2.2502, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0682, 'epoch': 4.89} {'Total_loss': 2.2494, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0675, 'epoch': 4.9} {'Total_loss': 2.2535, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0714, 'epoch': 4.91} {'Total_loss': 2.2313, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0491, 'epoch': 4.92} {'Total_loss': 2.2278, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0458, 'epoch': 4.93} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0458, 'epoch': 4.94} {'Total_loss': 2.24, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0581, 'epoch': 4.95} {'Total_loss': 2.2499, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0672, 'epoch': 4.96} {'Total_loss': 2.2593, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0774, 'epoch': 4.97} {'eval_loss': 2.3611326217651367, 'eval_bleu': 54.36605619633802, 'eval_runtime': 200.5003, 'eval_samples_per_second': 12.239, 'eval_steps_per_second': 0.613, 'epoch': 4.97} {'train_runtime': 21247.9554, 'train_samples_per_second': 23531.676, 'train_steps_per_second': 2.353, 'train_loss': 2.657199195404053, 'epoch': 4.97} train metrics epoch = 4.97 train_loss = 2.6572 train_runtime = 5:54:07.95 train_samples = 663486 train_samples_per_second = 23531.676 train_steps_per_second = 2.353 predict metrics predict_bleu = 54.0698 predict_loss = 2.407 predict_runtime = 0:03:10.38 predict_samples = 2483 predict_samples_per_second = 13.042 predict_steps_per_second = 0.657 `

Hannibal046 commented 1 year ago

try to increase the num_train_epochs or max steps

在 2023-07-04 12:50:31,Van-Hien Tran @.***> 写道:

Thank you very much for your quick reply! I understand that the model used the checkpoint with the best performance on the dev set to evaluate on the test set. Here is the full output file. Please help me to consider it. DataArgs(dataset_dir_prefix='data/', dataset_path='jrc_joint_bpe/ende', train_file='data/jrc_joint_bpe/ende/train.json', dev_file='data/jrc_joint_bpe/ende/dev.json', test_file='data/jrc_joint_bpe/ende/test.json', use_cache=False, max_src_len=250, max_trg_len=118, min_trg_len=3, src_vocab_file='data/jrc_joint_bpe/ende/src.vocab', trg_vocab_file='data/jrc_joint_bpe/ende/tgt.vocab', src='en', trg='de') MarianConfig { "activation_dropout": 0.0, "activation_function": "relu", "attention_dropout": 0.0, "bos_token_id": 3, "classifier_dropout": 0.0, "contrastive_lambda": 1, "contrastive_loss_balance": false, "contrastive_temperature": 0.15, "d_model": 512, "decoder_attention_heads": 8, "decoder_ffn_dim": 2048, "decoder_layerdrop": 0.0, "decoder_layers": 6, "decoder_start_token_id": 3, "decoder_type": "dual_cross_attention", "dropout": 0.1, "encoder_attention_heads": 8, "encoder_ffn_dim": 2048, "encoder_layerdrop": 0.0, "encoder_layers": 6, "eos_token_id": 2, "forced_eos_token_id": 2, "gradient_checkpointing": false, "init_std": 0.02, "is_encoder_decoder": true, "max_length": 118, "max_position_embeddings": 1024, "max_src_len": 250, "max_tm_len": 500, "max_trg_len": 118, "min_trg_len": 3, "model_arch": "retrieval_augmented", "model_type": "marian", "num_beams": 5, "num_hidden_layers": 6, "output_attentions": true, "output_hidden_states": true, "pad_token_id": 1, "pooler_type": "cls_mlp", "scale_embedding": true, "src_vocab_size": 0, "tm_encoder_attention_heads": 8, "tm_encoder_dropout": 0.0, "tm_encoder_ffn_dim": 2048, "tm_encoder_layers": 6, "tm_encoder_type": "group_attention", "tm_size": 5, "transformers_version": "4.9.0", "trg_vocab_size": 0, "use_cache": true, "use_contrastive": true, "use_copy": true, "use_joint_bpe": true, "use_shared_encoder": true, "vocab_size": 50265 } TrainingArgs( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_steps=2000, evaluation_strategy=IntervalStrategy.STEPS, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=01, gradient_accumulation_steps=1, greater_is_better=True, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=True, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=results/jrc/ende/dual/runs/Jul03_23-01-27_sccdlb032, logging_first_step=True, logging_steps=100, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=50000, metric_for_best_model=bleu, mp_parameters=, multiple_loss=True, no_cuda=False, num_train_epochs=20, output_dir=results/jrc/ende/6657, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=20, per_device_train_batch_size=10000, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=dual, push_to_hub_organization=None, push_to_hub_token=None, remove_unused_columns=False, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=results/jrc/ende/dual, save_on_each_node=False, save_steps=2000, save_strategy=IntervalStrategy.STEPS, save_total_limit=2, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=8000, weight_decay=0.0, ) Initializing Model... data/jrc_joint_bpe/ende/train.json Dataset Samples: 663487 data/retrieval/ende/src_editdis_alpha_0.7.pkl 663487 Dataset Samples After filtering: 663486 Average Senteces in One Batch: 65.97911694510739 data/jrc_joint_bpe/ende/dev.json Dataset Samples: 2454 data/retrieval/ende/src_editdis_alpha_0.7.pkl 2454 Dataset Samples After filtering: 2454 data/jrc_joint_bpe/ende/test.json Dataset Samples: 2483 data/retrieval/ende/src_editdis_alpha_0.7.pkl 2483 Dataset Samples After filtering: 2483 Loading Complete {'Total_loss': 8.5871, 'Contrastive_loss': 1.9092, 'CrossEntropy_loss': 6.678, 'epoch': 0.0} {'Total_loss': 7.9695, 'Contrastive_loss': 1.2864, 'CrossEntropy_loss': 6.6831, 'epoch': 0.01} {'Total_loss': 6.9269, 'Contrastive_loss': 0.5813, 'CrossEntropy_loss': 6.3456, 'epoch': 0.02} {'Total_loss': 6.6431, 'Contrastive_loss': 0.4108, 'CrossEntropy_loss': 6.2323, 'epoch': 0.03} {'Total_loss': 6.1458, 'Contrastive_loss': 0.3267, 'CrossEntropy_loss': 5.8191, 'epoch': 0.04} {'Total_loss': 5.9742, 'Contrastive_loss': 0.2718, 'CrossEntropy_loss': 5.7024, 'epoch': 0.05} {'Total_loss': 5.7876, 'Contrastive_loss': 0.2481, 'CrossEntropy_loss': 5.5395, 'epoch': 0.06} {'Total_loss': 5.5869, 'Contrastive_loss': 0.2276, 'CrossEntropy_loss': 5.3592, 'epoch': 0.07} {'Total_loss': 5.5471, 'Contrastive_loss': 0.2232, 'CrossEntropy_loss': 5.3239, 'epoch': 0.08} {'Total_loss': 5.3336, 'Contrastive_loss': 0.2098, 'CrossEntropy_loss': 5.1238, 'epoch': 0.09} {'Total_loss': 5.2709, 'Contrastive_loss': 0.2043, 'CrossEntropy_loss': 5.0666, 'epoch': 0.1} {'Total_loss': 5.3056, 'Contrastive_loss': 0.2016, 'CrossEntropy_loss': 5.104, 'epoch': 0.11} {'Total_loss': 5.1, 'Contrastive_loss': 0.1957, 'CrossEntropy_loss': 4.9043, 'epoch': 0.12} {'Total_loss': 5.0497, 'Contrastive_loss': 0.1977, 'CrossEntropy_loss': 4.852, 'epoch': 0.13} {'Total_loss': 4.9868, 'Contrastive_loss': 0.1957, 'CrossEntropy_loss': 4.7911, 'epoch': 0.14} {'Total_loss': 4.9448, 'Contrastive_loss': 0.1934, 'CrossEntropy_loss': 4.7514, 'epoch': 0.15} {'Total_loss': 4.799, 'Contrastive_loss': 0.1931, 'CrossEntropy_loss': 4.606, 'epoch': 0.16} {'Total_loss': 4.845, 'Contrastive_loss': 0.1903, 'CrossEntropy_loss': 4.6547, 'epoch': 0.17} {'Total_loss': 4.7578, 'Contrastive_loss': 0.19, 'CrossEntropy_loss': 4.5678, 'epoch': 0.18} {'Total_loss': 4.6073, 'Contrastive_loss': 0.1926, 'CrossEntropy_loss': 4.4147, 'epoch': 0.19} {'Total_loss': 4.6527, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 4.4628, 'epoch': 0.2} {'eval_loss': 4.475997447967529, 'eval_bleu': 0.0, 'eval_runtime': 47.0383, 'eval_samples_per_second': 52.17, 'eval_steps_per_second': 2.615, 'epoch': 0.2} {'Total_loss': 4.5412, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 4.3512, 'epoch': 0.21} {'Total_loss': 4.4725, 'Contrastive_loss': 0.19, 'CrossEntropy_loss': 4.2825, 'epoch': 0.22} {'Total_loss': 4.4605, 'Contrastive_loss': 0.1875, 'CrossEntropy_loss': 4.2731, 'epoch': 0.23} {'Total_loss': 4.3126, 'Contrastive_loss': 0.1904, 'CrossEntropy_loss': 4.1222, 'epoch': 0.24} {'Total_loss': 4.3089, 'Contrastive_loss': 0.188, 'CrossEntropy_loss': 4.1209, 'epoch': 0.25} {'Total_loss': 4.2178, 'Contrastive_loss': 0.1883, 'CrossEntropy_loss': 4.0296, 'epoch': 0.26} {'Total_loss': 4.1015, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 3.9155, 'epoch': 0.27} {'Total_loss': 4.1292, 'Contrastive_loss': 0.1886, 'CrossEntropy_loss': 3.9406, 'epoch': 0.28} {'Total_loss': 4.118, 'Contrastive_loss': 0.1885, 'CrossEntropy_loss': 3.9295, 'epoch': 0.29} {'Total_loss': 4.1581, 'Contrastive_loss': 0.1857, 'CrossEntropy_loss': 3.9724, 'epoch': 0.3} {'Total_loss': 3.9412, 'Contrastive_loss': 0.1883, 'CrossEntropy_loss': 3.7529, 'epoch': 0.31} {'Total_loss': 3.9274, 'Contrastive_loss': 0.1864, 'CrossEntropy_loss': 3.7409, 'epoch': 0.32} {'Total_loss': 3.92, 'Contrastive_loss': 0.1853, 'CrossEntropy_loss': 3.7348, 'epoch': 0.33} {'Total_loss': 3.9338, 'Contrastive_loss': 0.1869, 'CrossEntropy_loss': 3.7469, 'epoch': 0.34} {'Total_loss': 3.8306, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 3.6459, 'epoch': 0.35} {'Total_loss': 3.8668, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 3.6809, 'epoch': 0.36} {'Total_loss': 3.7904, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 3.6056, 'epoch': 0.37} {'Total_loss': 3.7859, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 3.5996, 'epoch': 0.38} {'Total_loss': 3.6532, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 3.4689, 'epoch': 0.39} {'Total_loss': 3.7444, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 3.5586, 'epoch': 0.4} {'eval_loss': 3.615442991256714, 'eval_bleu': 0.0, 'eval_runtime': 46.0847, 'eval_samples_per_second': 53.25, 'eval_steps_per_second': 2.669, 'epoch': 0.4} {'Total_loss': 3.6788, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 3.494, 'epoch': 0.41} {'Total_loss': 3.6822, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 3.4974, 'epoch': 0.42} {'Total_loss': 3.7083, 'Contrastive_loss': 0.1916, 'CrossEntropy_loss': 3.5166, 'epoch': 0.43} {'Total_loss': 3.5641, 'Contrastive_loss': 0.1901, 'CrossEntropy_loss': 3.374, 'epoch': 0.44} {'Total_loss': 3.6535, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 3.4693, 'epoch': 0.45} {'Total_loss': 3.6156, 'Contrastive_loss': 0.1945, 'CrossEntropy_loss': 3.4211, 'epoch': 0.46} {'Total_loss': 3.5391, 'Contrastive_loss': 0.1865, 'CrossEntropy_loss': 3.3526, 'epoch': 0.47} {'Total_loss': 3.4827, 'Contrastive_loss': 0.1868, 'CrossEntropy_loss': 3.2959, 'epoch': 0.48} {'Total_loss': 3.5448, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 3.3589, 'epoch': 0.49} {'Total_loss': 3.4508, 'Contrastive_loss': 0.1853, 'CrossEntropy_loss': 3.2656, 'epoch': 0.5} {'Total_loss': 3.5725, 'Contrastive_loss': 0.1882, 'CrossEntropy_loss': 3.3844, 'epoch': 0.51} {'Total_loss': 3.3968, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 3.2114, 'epoch': 0.52} {'Total_loss': 3.3959, 'Contrastive_loss': 0.1895, 'CrossEntropy_loss': 3.2064, 'epoch': 0.53} {'Total_loss': 3.3771, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 3.1921, 'epoch': 0.54} {'Total_loss': 3.3779, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 3.192, 'epoch': 0.55} {'Total_loss': 3.396, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 3.2096, 'epoch': 0.56} {'Total_loss': 3.3529, 'Contrastive_loss': 0.1914, 'CrossEntropy_loss': 3.1615, 'epoch': 0.57} {'Total_loss': 3.3732, 'Contrastive_loss': 0.1893, 'CrossEntropy_loss': 3.1839, 'epoch': 0.58} {'Total_loss': 3.318, 'Contrastive_loss': 0.1887, 'CrossEntropy_loss': 3.1293, 'epoch': 0.59} {'Total_loss': 3.3289, 'Contrastive_loss': 0.1882, 'CrossEntropy_loss': 3.1407, 'epoch': 0.6} {'eval_loss': 3.2343878746032715, 'eval_bleu': 0.0, 'eval_runtime': 47.9308, 'eval_samples_per_second': 51.199, 'eval_steps_per_second': 2.566, 'epoch': 0.6} {'Total_loss': 3.322, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 3.1371, 'epoch': 0.61} {'Total_loss': 3.3367, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 3.1529, 'epoch': 0.62} {'Total_loss': 3.3023, 'Contrastive_loss': 0.1886, 'CrossEntropy_loss': 3.1137, 'epoch': 0.63} {'Total_loss': 3.2682, 'Contrastive_loss': 0.1983, 'CrossEntropy_loss': 3.0699, 'epoch': 0.64} {'Total_loss': 3.297, 'Contrastive_loss': 0.1959, 'CrossEntropy_loss': 3.1011, 'epoch': 0.65} {'Total_loss': 3.2783, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 3.0918, 'epoch': 0.66} {'Total_loss': 3.2495, 'Contrastive_loss': 0.1933, 'CrossEntropy_loss': 3.0562, 'epoch': 0.67} {'Total_loss': 3.302, 'Contrastive_loss': 0.1954, 'CrossEntropy_loss': 3.1067, 'epoch': 0.68} {'Total_loss': 3.2392, 'Contrastive_loss': 0.1911, 'CrossEntropy_loss': 3.0481, 'epoch': 0.69} {'Total_loss': 3.2368, 'Contrastive_loss': 0.1925, 'CrossEntropy_loss': 3.0444, 'epoch': 0.7} {'Total_loss': 3.292, 'Contrastive_loss': 0.1938, 'CrossEntropy_loss': 3.0983, 'epoch': 0.71} {'Total_loss': 3.2906, 'Contrastive_loss': 0.1937, 'CrossEntropy_loss': 3.0969, 'epoch': 0.72} {'Total_loss': 3.2616, 'Contrastive_loss': 0.2115, 'CrossEntropy_loss': 3.0501, 'epoch': 0.73} {'Total_loss': 3.2079, 'Contrastive_loss': 0.1905, 'CrossEntropy_loss': 3.0174, 'epoch': 0.74} {'Total_loss': 3.2165, 'Contrastive_loss': 0.1879, 'CrossEntropy_loss': 3.0286, 'epoch': 0.75} {'Total_loss': 3.2289, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 3.0423, 'epoch': 0.76} {'Total_loss': 3.1948, 'Contrastive_loss': 0.1926, 'CrossEntropy_loss': 3.0022, 'epoch': 0.77} {'Total_loss': 3.2068, 'Contrastive_loss': 0.1945, 'CrossEntropy_loss': 3.0123, 'epoch': 0.78} {'Total_loss': 3.1585, 'Contrastive_loss': 0.1938, 'CrossEntropy_loss': 2.9647, 'epoch': 0.79} {'Total_loss': 3.1925, 'Contrastive_loss': 0.2016, 'CrossEntropy_loss': 2.9909, 'epoch': 0.8} {'eval_loss': 3.068711757659912, 'eval_bleu': 0.0, 'eval_runtime': 47.8086, 'eval_samples_per_second': 51.33, 'eval_steps_per_second': 2.573, 'epoch': 0.8} {'Total_loss': 3.261, 'Contrastive_loss': 0.2016, 'CrossEntropy_loss': 3.0594, 'epoch': 0.81} {'Total_loss': 3.2025, 'Contrastive_loss': 0.228, 'CrossEntropy_loss': 2.9745, 'epoch': 0.82} {'Total_loss': 3.2189, 'Contrastive_loss': 0.2059, 'CrossEntropy_loss': 3.0129, 'epoch': 0.83} {'Total_loss': 3.2093, 'Contrastive_loss': 0.1929, 'CrossEntropy_loss': 3.0164, 'epoch': 0.84} {'Total_loss': 3.1422, 'Contrastive_loss': 0.2061, 'CrossEntropy_loss': 2.9362, 'epoch': 0.85} {'Total_loss': 3.108, 'Contrastive_loss': 0.2098, 'CrossEntropy_loss': 2.8982, 'epoch': 0.86} {'Total_loss': 3.1664, 'Contrastive_loss': 0.2013, 'CrossEntropy_loss': 2.9651, 'epoch': 0.87} {'Total_loss': 3.1319, 'Contrastive_loss': 0.1975, 'CrossEntropy_loss': 2.9343, 'epoch': 0.88} {'Total_loss': 3.1262, 'Contrastive_loss': 0.236, 'CrossEntropy_loss': 2.8901, 'epoch': 0.89} {'Total_loss': 3.1095, 'Contrastive_loss': 0.1889, 'CrossEntropy_loss': 2.9206, 'epoch': 0.89} {'Total_loss': 3.0944, 'Contrastive_loss': 0.1951, 'CrossEntropy_loss': 2.8994, 'epoch': 0.9} {'Total_loss': 3.1704, 'Contrastive_loss': 0.2005, 'CrossEntropy_loss': 2.9699, 'epoch': 0.91} {'Total_loss': 3.0742, 'Contrastive_loss': 0.1952, 'CrossEntropy_loss': 2.879, 'epoch': 0.92} {'Total_loss': 3.0851, 'Contrastive_loss': 0.1874, 'CrossEntropy_loss': 2.8977, 'epoch': 0.93} {'Total_loss': 3.0776, 'Contrastive_loss': 0.2212, 'CrossEntropy_loss': 2.8564, 'epoch': 0.94} {'Total_loss': 3.1153, 'Contrastive_loss': 0.2119, 'CrossEntropy_loss': 2.9034, 'epoch': 0.95} {'Total_loss': 3.0434, 'Contrastive_loss': 0.2024, 'CrossEntropy_loss': 2.841, 'epoch': 0.96} {'Total_loss': 3.0355, 'Contrastive_loss': 0.1901, 'CrossEntropy_loss': 2.8454, 'epoch': 0.97} {'Total_loss': 3.0321, 'Contrastive_loss': 0.2079, 'CrossEntropy_loss': 2.8242, 'epoch': 0.98} {'Total_loss': 3.0528, 'Contrastive_loss': 0.1902, 'CrossEntropy_loss': 2.8626, 'epoch': 0.99} {'eval_loss': 2.9136242866516113, 'eval_bleu': 0.0, 'eval_runtime': 47.6533, 'eval_samples_per_second': 51.497, 'eval_steps_per_second': 2.581, 'epoch': 0.99} {'Total_loss': 3.0288, 'Contrastive_loss': 0.1874, 'CrossEntropy_loss': 2.8414, 'epoch': 1.0} {'Total_loss': 2.5967, 'Contrastive_loss': 0.191, 'CrossEntropy_loss': 2.4057, 'epoch': 1.01} {'Total_loss': 2.6331, 'Contrastive_loss': 0.1892, 'CrossEntropy_loss': 2.4439, 'epoch': 1.02} {'Total_loss': 2.6449, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 2.4599, 'epoch': 1.03} {'Total_loss': 2.6576, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.4736, 'epoch': 1.04} {'Total_loss': 2.6048, 'Contrastive_loss': 0.1931, 'CrossEntropy_loss': 2.4117, 'epoch': 1.05} {'Total_loss': 2.6301, 'Contrastive_loss': 0.2357, 'CrossEntropy_loss': 2.3944, 'epoch': 1.06} {'Total_loss': 2.6623, 'Contrastive_loss': 0.2235, 'CrossEntropy_loss': 2.4387, 'epoch': 1.07} {'Total_loss': 2.6582, 'Contrastive_loss': 0.2133, 'CrossEntropy_loss': 2.4449, 'epoch': 1.08} {'Total_loss': 2.6618, 'Contrastive_loss': 0.1944, 'CrossEntropy_loss': 2.4674, 'epoch': 1.09} {'Total_loss': 2.5991, 'Contrastive_loss': 0.2005, 'CrossEntropy_loss': 2.3986, 'epoch': 1.1} {'Total_loss': 2.6137, 'Contrastive_loss': 0.1887, 'CrossEntropy_loss': 2.425, 'epoch': 1.11} {'Total_loss': 2.6139, 'Contrastive_loss': 0.2064, 'CrossEntropy_loss': 2.4075, 'epoch': 1.12} {'Total_loss': 2.6443, 'Contrastive_loss': 0.2167, 'CrossEntropy_loss': 2.4276, 'epoch': 1.13} {'Total_loss': 2.625, 'Contrastive_loss': 0.2083, 'CrossEntropy_loss': 2.4166, 'epoch': 1.14} {'Total_loss': 2.6687, 'Contrastive_loss': 0.2443, 'CrossEntropy_loss': 2.4244, 'epoch': 1.15} {'Total_loss': 2.6229, 'Contrastive_loss': 0.2321, 'CrossEntropy_loss': 2.3909, 'epoch': 1.16} {'Total_loss': 2.6232, 'Contrastive_loss': 0.2062, 'CrossEntropy_loss': 2.417, 'epoch': 1.17} {'Total_loss': 2.597, 'Contrastive_loss': 0.2003, 'CrossEntropy_loss': 2.3968, 'epoch': 1.18} {'Total_loss': 2.5784, 'Contrastive_loss': 0.1919, 'CrossEntropy_loss': 2.3865, 'epoch': 1.19} {'eval_loss': 2.8263299465179443, 'eval_bleu': 0.0, 'eval_runtime': 47.5913, 'eval_samples_per_second': 51.564, 'eval_steps_per_second': 2.585, 'epoch': 1.19} {'Total_loss': 2.675, 'Contrastive_loss': 0.2484, 'CrossEntropy_loss': 2.4266, 'epoch': 1.2} {'Total_loss': 2.6234, 'Contrastive_loss': 0.2451, 'CrossEntropy_loss': 2.3783, 'epoch': 1.21} {'Total_loss': 2.5849, 'Contrastive_loss': 0.1961, 'CrossEntropy_loss': 2.3888, 'epoch': 1.22} {'Total_loss': 2.6179, 'Contrastive_loss': 0.2097, 'CrossEntropy_loss': 2.4082, 'epoch': 1.23} {'Total_loss': 2.6146, 'Contrastive_loss': 0.2308, 'CrossEntropy_loss': 2.3839, 'epoch': 1.24} {'Total_loss': 2.5999, 'Contrastive_loss': 0.1946, 'CrossEntropy_loss': 2.4054, 'epoch': 1.25} {'Total_loss': 2.5606, 'Contrastive_loss': 0.1895, 'CrossEntropy_loss': 2.3711, 'epoch': 1.26} {'Total_loss': 2.5558, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.3696, 'epoch': 1.27} {'Total_loss': 2.5838, 'Contrastive_loss': 0.2322, 'CrossEntropy_loss': 2.3516, 'epoch': 1.28} {'Total_loss': 2.5681, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 2.3815, 'epoch': 1.29} {'Total_loss': 2.6259, 'Contrastive_loss': 0.2376, 'CrossEntropy_loss': 2.3884, 'epoch': 1.3} {'Total_loss': 2.5884, 'Contrastive_loss': 0.212, 'CrossEntropy_loss': 2.3764, 'epoch': 1.31} {'Total_loss': 2.5409, 'Contrastive_loss': 0.1975, 'CrossEntropy_loss': 2.3434, 'epoch': 1.32} {'Total_loss': 2.5743, 'Contrastive_loss': 0.1885, 'CrossEntropy_loss': 2.3858, 'epoch': 1.33} {'Total_loss': 2.5428, 'Contrastive_loss': 0.1902, 'CrossEntropy_loss': 2.3526, 'epoch': 1.34} {'Total_loss': 2.5822, 'Contrastive_loss': 0.1936, 'CrossEntropy_loss': 2.3886, 'epoch': 1.35} {'Total_loss': 2.5383, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.3528, 'epoch': 1.36} {'Total_loss': 2.5074, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 2.321, 'epoch': 1.37} {'Total_loss': 2.5501, 'Contrastive_loss': 0.1904, 'CrossEntropy_loss': 2.3597, 'epoch': 1.38} {'Total_loss': 2.5195, 'Contrastive_loss': 0.1882, 'CrossEntropy_loss': 2.3314, 'epoch': 1.39} {'eval_loss': 2.728792190551758, 'eval_bleu': 0.0, 'eval_runtime': 47.5735, 'eval_samples_per_second': 51.583, 'eval_steps_per_second': 2.585, 'epoch': 1.39} {'Total_loss': 2.5535, 'Contrastive_loss': 0.2074, 'CrossEntropy_loss': 2.3461, 'epoch': 1.4} {'Total_loss': 2.53, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.3438, 'epoch': 1.41} {'Total_loss': 2.5256, 'Contrastive_loss': 0.188, 'CrossEntropy_loss': 2.3377, 'epoch': 1.42} {'Total_loss': 2.5134, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.3278, 'epoch': 1.43} {'Total_loss': 2.4979, 'Contrastive_loss': 0.1867, 'CrossEntropy_loss': 2.3112, 'epoch': 1.44} {'Total_loss': 2.5007, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.3169, 'epoch': 1.45} {'Total_loss': 2.5313, 'Contrastive_loss': 0.1964, 'CrossEntropy_loss': 2.335, 'epoch': 1.46} {'Total_loss': 2.529, 'Contrastive_loss': 0.2008, 'CrossEntropy_loss': 2.3282, 'epoch': 1.47} {'Total_loss': 2.4755, 'Contrastive_loss': 0.1842, 'CrossEntropy_loss': 2.2912, 'epoch': 1.48} {'Total_loss': 2.4807, 'Contrastive_loss': 0.1892, 'CrossEntropy_loss': 2.2915, 'epoch': 1.49} {'Total_loss': 2.5163, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.3325, 'epoch': 1.5} {'Total_loss': 2.4685, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.2856, 'epoch': 1.51} {'Total_loss': 2.4948, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.3116, 'epoch': 1.52} {'Total_loss': 2.4745, 'Contrastive_loss': 0.1915, 'CrossEntropy_loss': 2.2831, 'epoch': 1.53} {'Total_loss': 2.489, 'Contrastive_loss': 0.1953, 'CrossEntropy_loss': 2.2937, 'epoch': 1.54} {'Total_loss': 2.4989, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 2.3132, 'epoch': 1.55} {'Total_loss': 2.4978, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.3142, 'epoch': 1.56} {'Total_loss': 2.5127, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.3298, 'epoch': 1.57} {'Total_loss': 2.5181, 'Contrastive_loss': 0.1897, 'CrossEntropy_loss': 2.3284, 'epoch': 1.58} {'Total_loss': 2.4745, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.289, 'epoch': 1.59} {'eval_loss': 2.653449535369873, 'eval_bleu': 0.0, 'eval_runtime': 47.5117, 'eval_samples_per_second': 51.65, 'eval_steps_per_second': 2.589, 'epoch': 1.59} {'Total_loss': 2.4969, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.3144, 'epoch': 1.6} {'Total_loss': 2.4451, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.2604, 'epoch': 1.61} {'Total_loss': 2.4944, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.3092, 'epoch': 1.62} {'Total_loss': 2.4933, 'Contrastive_loss': 0.1874, 'CrossEntropy_loss': 2.3058, 'epoch': 1.63} {'Total_loss': 2.4844, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.3004, 'epoch': 1.64} {'Total_loss': 2.4124, 'Contrastive_loss': 0.1877, 'CrossEntropy_loss': 2.2247, 'epoch': 1.65} {'Total_loss': 2.4526, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.268, 'epoch': 1.66} {'Total_loss': 2.4455, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.2614, 'epoch': 1.67} {'Total_loss': 2.4624, 'Contrastive_loss': 0.186, 'CrossEntropy_loss': 2.2764, 'epoch': 1.68} {'Total_loss': 2.4402, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.2568, 'epoch': 1.69} {'Total_loss': 2.4699, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 2.2845, 'epoch': 1.7} {'Total_loss': 2.4525, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.2691, 'epoch': 1.71} {'Total_loss': 2.4427, 'Contrastive_loss': 0.1877, 'CrossEntropy_loss': 2.2551, 'epoch': 1.72} {'Total_loss': 2.4745, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.2916, 'epoch': 1.73} {'Total_loss': 2.491, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.3083, 'epoch': 1.74} {'Total_loss': 2.4476, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.2645, 'epoch': 1.75} {'Total_loss': 2.4509, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.2679, 'epoch': 1.76} {'Total_loss': 2.4308, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.248, 'epoch': 1.77} {'Total_loss': 2.4417, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.2573, 'epoch': 1.78} {'Total_loss': 2.4443, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.2598, 'epoch': 1.79} {'eval_loss': 2.597923755645752, 'eval_bleu': 49.80716382320734, 'eval_runtime': 207.6051, 'eval_samples_per_second': 11.821, 'eval_steps_per_second': 0.592, 'epoch': 1.79} {'Total_loss': 2.4395, 'Contrastive_loss': 0.1863, 'CrossEntropy_loss': 2.2532, 'epoch': 1.8} {'Total_loss': 2.4053, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.2198, 'epoch': 1.81} {'Total_loss': 2.4413, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.2541, 'epoch': 1.82} {'Total_loss': 2.4284, 'Contrastive_loss': 0.1849, 'CrossEntropy_loss': 2.2435, 'epoch': 1.83} {'Total_loss': 2.4404, 'Contrastive_loss': 0.185, 'CrossEntropy_loss': 2.2555, 'epoch': 1.84} {'Total_loss': 2.4281, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.2453, 'epoch': 1.85} {'Total_loss': 2.4562, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.2721, 'epoch': 1.86} {'Total_loss': 2.4176, 'Contrastive_loss': 0.1878, 'CrossEntropy_loss': 2.2298, 'epoch': 1.87} {'Total_loss': 2.4227, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.2382, 'epoch': 1.88} {'Total_loss': 2.4036, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.2203, 'epoch': 1.89} {'Total_loss': 2.4288, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 2.2445, 'epoch': 1.9} {'Total_loss': 2.4106, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.2272, 'epoch': 1.91} {'Total_loss': 2.4278, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.2432, 'epoch': 1.92} {'Total_loss': 2.4145, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.2293, 'epoch': 1.93} {'Total_loss': 2.4163, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.229, 'epoch': 1.94} {'Total_loss': 2.4153, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.2332, 'epoch': 1.95} {'Total_loss': 2.3972, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.214, 'epoch': 1.96} {'Total_loss': 2.3942, 'Contrastive_loss': 0.188, 'CrossEntropy_loss': 2.2062, 'epoch': 1.97} {'Total_loss': 2.3891, 'Contrastive_loss': 0.1853, 'CrossEntropy_loss': 2.2038, 'epoch': 1.98} {'Total_loss': 2.3855, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.2034, 'epoch': 1.99} {'eval_loss': 2.5653457641601562, 'eval_bleu': 51.134713477013314, 'eval_runtime': 208.6028, 'eval_samples_per_second': 11.764, 'eval_steps_per_second': 0.59, 'epoch': 1.99} {'Total_loss': 2.3773, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.195, 'epoch': 2.0} {'Total_loss': 2.3427, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1598, 'epoch': 2.01} {'Total_loss': 2.3381, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1547, 'epoch': 2.02} {'Total_loss': 2.38, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1962, 'epoch': 2.03} {'Total_loss': 2.3861, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.2026, 'epoch': 2.04} {'Total_loss': 2.307, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1238, 'epoch': 2.05} {'Total_loss': 2.3741, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1912, 'epoch': 2.06} {'Total_loss': 2.3524, 'Contrastive_loss': 0.1897, 'CrossEntropy_loss': 2.1627, 'epoch': 2.07} {'Total_loss': 2.3685, 'Contrastive_loss': 0.1862, 'CrossEntropy_loss': 2.1823, 'epoch': 2.08} {'Total_loss': 2.3893, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.2063, 'epoch': 2.09} {'Total_loss': 2.3638, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1817, 'epoch': 2.1} {'Total_loss': 2.4216, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.2372, 'epoch': 2.11} {'Total_loss': 2.3677, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 2.1823, 'epoch': 2.12} {'Total_loss': 2.3411, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.157, 'epoch': 2.13} {'Total_loss': 2.3643, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1822, 'epoch': 2.14} {'Total_loss': 2.3706, 'Contrastive_loss': 0.1893, 'CrossEntropy_loss': 2.1814, 'epoch': 2.15} {'Total_loss': 2.3648, 'Contrastive_loss': 0.1862, 'CrossEntropy_loss': 2.1786, 'epoch': 2.16} {'Total_loss': 2.3758, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.1902, 'epoch': 2.17} {'Total_loss': 2.372, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1885, 'epoch': 2.18} {'Total_loss': 2.3481, 'Contrastive_loss': 0.1844, 'CrossEntropy_loss': 2.1637, 'epoch': 2.19} {'eval_loss': 2.535236120223999, 'eval_bleu': 50.85850127149657, 'eval_runtime': 202.665, 'eval_samples_per_second': 12.109, 'eval_steps_per_second': 0.607, 'epoch': 2.19} {'Total_loss': 2.3967, 'Contrastive_loss': 0.1915, 'CrossEntropy_loss': 2.2052, 'epoch': 2.2} {'Total_loss': 2.3312, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.1473, 'epoch': 2.21} {'Total_loss': 2.3456, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1627, 'epoch': 2.22} {'Total_loss': 2.367, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.1848, 'epoch': 2.23} {'Total_loss': 2.3682, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.1859, 'epoch': 2.24} {'Total_loss': 2.3642, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1822, 'epoch': 2.25} {'Total_loss': 2.3502, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1672, 'epoch': 2.26} {'Total_loss': 2.3417, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1588, 'epoch': 2.27} {'Total_loss': 2.3553, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 2.1654, 'epoch': 2.28} {'Total_loss': 2.3813, 'Contrastive_loss': 0.1935, 'CrossEntropy_loss': 2.1879, 'epoch': 2.29} {'Total_loss': 2.3538, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.1687, 'epoch': 2.3} {'Total_loss': 2.3892, 'Contrastive_loss': 0.1918, 'CrossEntropy_loss': 2.1973, 'epoch': 2.31} {'Total_loss': 2.3378, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.154, 'epoch': 2.32} {'Total_loss': 2.3711, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.185, 'epoch': 2.33} {'Total_loss': 2.372, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.1873, 'epoch': 2.34} {'Total_loss': 2.3446, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.159, 'epoch': 2.35} {'Total_loss': 2.3505, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.1665, 'epoch': 2.36} {'Total_loss': 2.3465, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1636, 'epoch': 2.37} {'Total_loss': 2.3911, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.207, 'epoch': 2.38} {'Total_loss': 2.3696, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 2.1837, 'epoch': 2.39} {'eval_loss': 2.5176327228546143, 'eval_bleu': 50.89111839209788, 'eval_runtime': 193.6496, 'eval_samples_per_second': 12.672, 'eval_steps_per_second': 0.635, 'epoch': 2.39} {'Total_loss': 2.3632, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.1793, 'epoch': 2.4} {'Total_loss': 2.3402, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1577, 'epoch': 2.41} {'Total_loss': 2.3848, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.2028, 'epoch': 2.42} {'Total_loss': 2.3302, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1471, 'epoch': 2.43} {'Total_loss': 2.3447, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1622, 'epoch': 2.44} {'Total_loss': 2.3574, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 2.1716, 'epoch': 2.45} {'Total_loss': 2.3612, 'Contrastive_loss': 0.191, 'CrossEntropy_loss': 2.1702, 'epoch': 2.46} {'Total_loss': 2.3565, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.171, 'epoch': 2.47} {'Total_loss': 2.3586, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.176, 'epoch': 2.48} {'Total_loss': 2.3467, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.1645, 'epoch': 2.49} {'Total_loss': 2.3531, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1701, 'epoch': 2.5} {'Total_loss': 2.3457, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.163, 'epoch': 2.51} {'Total_loss': 2.332, 'Contrastive_loss': 0.1866, 'CrossEntropy_loss': 2.1454, 'epoch': 2.52} {'Total_loss': 2.3799, 'Contrastive_loss': 0.1831, 'CrossEntropy_loss': 2.1968, 'epoch': 2.53} {'Total_loss': 2.3343, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.1517, 'epoch': 2.54} {'Total_loss': 2.3344, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.1511, 'epoch': 2.55} {'Total_loss': 2.3462, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1628, 'epoch': 2.56} {'Total_loss': 2.35, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.1627, 'epoch': 2.57} {'Total_loss': 2.3536, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.17, 'epoch': 2.58} {'Total_loss': 2.3502, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.168, 'epoch': 2.59} {'eval_loss': 2.4858028888702393, 'eval_bleu': 51.683593977892926, 'eval_runtime': 184.2891, 'eval_samples_per_second': 13.316, 'eval_steps_per_second': 0.667, 'epoch': 2.59} {'Total_loss': 2.341, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.1577, 'epoch': 2.6} {'Total_loss': 2.3287, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.1441, 'epoch': 2.61} {'Total_loss': 2.3235, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.139, 'epoch': 2.62} {'Total_loss': 2.3407, 'Contrastive_loss': 0.1844, 'CrossEntropy_loss': 2.1563, 'epoch': 2.63} {'Total_loss': 2.3196, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1362, 'epoch': 2.64} {'Total_loss': 2.3087, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1249, 'epoch': 2.65} {'Total_loss': 2.3618, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1793, 'epoch': 2.66} {'Total_loss': 2.3399, 'Contrastive_loss': 0.192, 'CrossEntropy_loss': 2.1479, 'epoch': 2.67} {'Total_loss': 2.3522, 'Contrastive_loss': 0.1851, 'CrossEntropy_loss': 2.1672, 'epoch': 2.68} {'Total_loss': 2.3514, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.1678, 'epoch': 2.68} {'Total_loss': 2.3138, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.1311, 'epoch': 2.69} {'Total_loss': 2.3503, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 2.1655, 'epoch': 2.7} {'Total_loss': 2.3433, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1604, 'epoch': 2.71} {'Total_loss': 2.3396, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.1568, 'epoch': 2.72} {'Total_loss': 2.3325, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.1503, 'epoch': 2.73} {'Total_loss': 2.3535, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 2.1676, 'epoch': 2.74} {'Total_loss': 2.3536, 'Contrastive_loss': 0.1873, 'CrossEntropy_loss': 2.1663, 'epoch': 2.75} {'Total_loss': 2.3535, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.17, 'epoch': 2.76} {'Total_loss': 2.3375, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.1556, 'epoch': 2.77} {'Total_loss': 2.3496, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.1673, 'epoch': 2.78} {'eval_loss': 2.46993350982666, 'eval_bleu': 52.00898377771929, 'eval_runtime': 178.0425, 'eval_samples_per_second': 13.783, 'eval_steps_per_second': 0.691, 'epoch': 2.78} {'Total_loss': 2.3248, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1424, 'epoch': 2.79} {'Total_loss': 2.3109, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.1282, 'epoch': 2.8} {'Total_loss': 2.3336, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1506, 'epoch': 2.81} {'Total_loss': 2.3123, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1291, 'epoch': 2.82} {'Total_loss': 2.3297, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.1471, 'epoch': 2.83} {'Total_loss': 2.368, 'Contrastive_loss': 0.1883, 'CrossEntropy_loss': 2.1796, 'epoch': 2.84} {'Total_loss': 2.345, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1612, 'epoch': 2.85} {'Total_loss': 2.3744, 'Contrastive_loss': 0.1856, 'CrossEntropy_loss': 2.1888, 'epoch': 2.86} {'Total_loss': 2.3441, 'Contrastive_loss': 0.1831, 'CrossEntropy_loss': 2.161, 'epoch': 2.87} {'Total_loss': 2.3636, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.1799, 'epoch': 2.88} {'Total_loss': 2.289, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1061, 'epoch': 2.89} {'Total_loss': 2.311, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.1249, 'epoch': 2.9} {'Total_loss': 2.3154, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.132, 'epoch': 2.91} {'Total_loss': 2.3413, 'Contrastive_loss': 0.1899, 'CrossEntropy_loss': 2.1514, 'epoch': 2.92} {'Total_loss': 2.3332, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.1496, 'epoch': 2.93} {'Total_loss': 2.3287, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.1449, 'epoch': 2.94} {'Total_loss': 2.3343, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1509, 'epoch': 2.95} {'Total_loss': 2.3287, 'Contrastive_loss': 0.1861, 'CrossEntropy_loss': 2.1426, 'epoch': 2.96} {'Total_loss': 2.3261, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1436, 'epoch': 2.97} {'Total_loss': 2.3311, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1491, 'epoch': 2.98} {'eval_loss': 2.453214645385742, 'eval_bleu': 52.202791946059484, 'eval_runtime': 181.2982, 'eval_samples_per_second': 13.536, 'eval_steps_per_second': 0.678, 'epoch': 2.98} {'Total_loss': 2.3037, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.1201, 'epoch': 2.99} {'Total_loss': 2.3179, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1349, 'epoch': 3.0} {'Total_loss': 2.271, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0885, 'epoch': 3.01} {'Total_loss': 2.2661, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0834, 'epoch': 3.02} {'Total_loss': 2.2771, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0948, 'epoch': 3.03} {'Total_loss': 2.278, 'Contrastive_loss': 0.1841, 'CrossEntropy_loss': 2.0939, 'epoch': 3.04} {'Total_loss': 2.2858, 'Contrastive_loss': 0.1839, 'CrossEntropy_loss': 2.1018, 'epoch': 3.05} {'Total_loss': 2.2392, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.0563, 'epoch': 3.06} {'Total_loss': 2.2729, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.0891, 'epoch': 3.07} {'Total_loss': 2.2633, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0805, 'epoch': 3.08} {'Total_loss': 2.3441, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1606, 'epoch': 3.09} {'Total_loss': 2.293, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1098, 'epoch': 3.1} {'Total_loss': 2.2852, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1028, 'epoch': 3.11} {'Total_loss': 2.2762, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0942, 'epoch': 3.12} {'Total_loss': 2.2746, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0924, 'epoch': 3.13} {'Total_loss': 2.3037, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.1197, 'epoch': 3.14} {'Total_loss': 2.2738, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.0893, 'epoch': 3.15} {'Total_loss': 2.2991, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1167, 'epoch': 3.16} {'Total_loss': 2.2825, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.0984, 'epoch': 3.17} {'Total_loss': 2.2789, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0967, 'epoch': 3.18} {'eval_loss': 2.441037654876709, 'eval_bleu': 52.57699361982004, 'eval_runtime': 188.0562, 'eval_samples_per_second': 13.049, 'eval_steps_per_second': 0.654, 'epoch': 3.18} {'Total_loss': 2.2922, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.1092, 'epoch': 3.19} {'Total_loss': 2.2841, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1009, 'epoch': 3.2} {'Total_loss': 2.2943, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1122, 'epoch': 3.21} {'Total_loss': 2.2784, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.0951, 'epoch': 3.22} {'Total_loss': 2.2971, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.1138, 'epoch': 3.23} {'Total_loss': 2.3003, 'Contrastive_loss': 0.1843, 'CrossEntropy_loss': 2.116, 'epoch': 3.24} {'Total_loss': 2.3274, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.1441, 'epoch': 3.25} {'Total_loss': 2.3234, 'Contrastive_loss': 0.1857, 'CrossEntropy_loss': 2.1377, 'epoch': 3.26} {'Total_loss': 2.2684, 'Contrastive_loss': 0.1884, 'CrossEntropy_loss': 2.08, 'epoch': 3.27} {'Total_loss': 2.263, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.0784, 'epoch': 3.28} {'Total_loss': 2.2942, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.112, 'epoch': 3.29} {'Total_loss': 2.268, 'Contrastive_loss': 0.1838, 'CrossEntropy_loss': 2.0843, 'epoch': 3.3} {'Total_loss': 2.2813, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.0976, 'epoch': 3.31} {'Total_loss': 2.3315, 'Contrastive_loss': 0.1938, 'CrossEntropy_loss': 2.1377, 'epoch': 3.32} {'Total_loss': 2.2906, 'Contrastive_loss': 0.1854, 'CrossEntropy_loss': 2.1051, 'epoch': 3.33} {'Total_loss': 2.2727, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.0899, 'epoch': 3.34} {'Total_loss': 2.3197, 'Contrastive_loss': 0.1848, 'CrossEntropy_loss': 2.1349, 'epoch': 3.35} {'Total_loss': 2.257, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.0737, 'epoch': 3.36} {'Total_loss': 2.2576, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0745, 'epoch': 3.37} {'Total_loss': 2.2988, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.1153, 'epoch': 3.38} {'eval_loss': 2.430711269378662, 'eval_bleu': 52.58615360161306, 'eval_runtime': 177.0606, 'eval_samples_per_second': 13.86, 'eval_steps_per_second': 0.695, 'epoch': 3.38} {'Total_loss': 2.2812, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0989, 'epoch': 3.39} {'Total_loss': 2.289, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1069, 'epoch': 3.4} {'Total_loss': 2.3071, 'Contrastive_loss': 0.192, 'CrossEntropy_loss': 2.1151, 'epoch': 3.41} {'Total_loss': 2.2703, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0876, 'epoch': 3.42} {'Total_loss': 2.2586, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0758, 'epoch': 3.43} {'Total_loss': 2.2625, 'Contrastive_loss': 0.1871, 'CrossEntropy_loss': 2.0754, 'epoch': 3.44} {'Total_loss': 2.3089, 'Contrastive_loss': 0.186, 'CrossEntropy_loss': 2.1229, 'epoch': 3.45} {'Total_loss': 2.2925, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1089, 'epoch': 3.46} {'Total_loss': 2.2895, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.107, 'epoch': 3.47} {'Total_loss': 2.315, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.133, 'epoch': 3.48} {'Total_loss': 2.244, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0618, 'epoch': 3.49} {'Total_loss': 2.2738, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0915, 'epoch': 3.5} {'Total_loss': 2.2914, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1089, 'epoch': 3.51} {'Total_loss': 2.2642, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.081, 'epoch': 3.52} {'Total_loss': 2.3208, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1388, 'epoch': 3.53} {'Total_loss': 2.2664, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0841, 'epoch': 3.54} {'Total_loss': 2.2779, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0957, 'epoch': 3.55} {'Total_loss': 2.2633, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0813, 'epoch': 3.56} {'Total_loss': 2.3217, 'Contrastive_loss': 0.1842, 'CrossEntropy_loss': 2.1375, 'epoch': 3.57} {'Total_loss': 2.2967, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.1143, 'epoch': 3.58} {'eval_loss': 2.413973569869995, 'eval_bleu': 53.57574736311562, 'eval_runtime': 191.0835, 'eval_samples_per_second': 12.843, 'eval_steps_per_second': 0.644, 'epoch': 3.58} {'Total_loss': 2.3146, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.1316, 'epoch': 3.59} {'Total_loss': 2.2689, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0857, 'epoch': 3.6} {'Total_loss': 2.2762, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0943, 'epoch': 3.61} {'Total_loss': 2.2938, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.1119, 'epoch': 3.62} {'Total_loss': 2.3194, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1374, 'epoch': 3.63} {'Total_loss': 2.2412, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0593, 'epoch': 3.64} {'Total_loss': 2.269, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0869, 'epoch': 3.65} {'Total_loss': 2.2955, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.1134, 'epoch': 3.66} {'Total_loss': 2.2729, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.091, 'epoch': 3.67} {'Total_loss': 2.2616, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0795, 'epoch': 3.68} {'Total_loss': 2.27, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0871, 'epoch': 3.69} {'Total_loss': 2.2826, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.1006, 'epoch': 3.7} {'Total_loss': 2.2638, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0817, 'epoch': 3.71} {'Total_loss': 2.293, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.1111, 'epoch': 3.72} {'Total_loss': 2.2435, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0616, 'epoch': 3.73} {'Total_loss': 2.2528, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0706, 'epoch': 3.74} {'Total_loss': 2.2691, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0868, 'epoch': 3.75} {'Total_loss': 2.2577, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0756, 'epoch': 3.76} {'Total_loss': 2.2528, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.0681, 'epoch': 3.77} {'Total_loss': 2.3004, 'Contrastive_loss': 0.1857, 'CrossEntropy_loss': 2.1148, 'epoch': 3.78} {'eval_loss': 2.408651113510132, 'eval_bleu': 52.29365898583055, 'eval_runtime': 176.3842, 'eval_samples_per_second': 13.913, 'eval_steps_per_second': 0.697, 'epoch': 3.78} {'Total_loss': 2.2724, 'Contrastive_loss': 0.189, 'CrossEntropy_loss': 2.0833, 'epoch': 3.79} {'Total_loss': 2.2574, 'Contrastive_loss': 0.1884, 'CrossEntropy_loss': 2.069, 'epoch': 3.8} {'Total_loss': 2.2729, 'Contrastive_loss': 0.1961, 'CrossEntropy_loss': 2.0768, 'epoch': 3.81} {'Total_loss': 2.2897, 'Contrastive_loss': 0.1847, 'CrossEntropy_loss': 2.105, 'epoch': 3.82} {'Total_loss': 2.2641, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.0789, 'epoch': 3.83} {'Total_loss': 2.2805, 'Contrastive_loss': 0.186, 'CrossEntropy_loss': 2.0945, 'epoch': 3.84} {'Total_loss': 2.2932, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.1092, 'epoch': 3.85} {'Total_loss': 2.2909, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.1083, 'epoch': 3.86} {'Total_loss': 2.2628, 'Contrastive_loss': 0.187, 'CrossEntropy_loss': 2.0757, 'epoch': 3.87} {'Total_loss': 2.2801, 'Contrastive_loss': 0.1845, 'CrossEntropy_loss': 2.0956, 'epoch': 3.88} {'Total_loss': 2.2835, 'Contrastive_loss': 0.1881, 'CrossEntropy_loss': 2.0953, 'epoch': 3.89} {'Total_loss': 2.3079, 'Contrastive_loss': 0.1914, 'CrossEntropy_loss': 2.1164, 'epoch': 3.9} {'Total_loss': 2.3009, 'Contrastive_loss': 0.1877, 'CrossEntropy_loss': 2.1132, 'epoch': 3.91} {'Total_loss': 2.2958, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.1123, 'epoch': 3.92} {'Total_loss': 2.2789, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.0937, 'epoch': 3.93} {'Total_loss': 2.2826, 'Contrastive_loss': 0.2041, 'CrossEntropy_loss': 2.0786, 'epoch': 3.94} {'Total_loss': 2.2745, 'Contrastive_loss': 0.1951, 'CrossEntropy_loss': 2.0794, 'epoch': 3.95} {'Total_loss': 2.2954, 'Contrastive_loss': 0.1876, 'CrossEntropy_loss': 2.1077, 'epoch': 3.96} {'Total_loss': 2.2917, 'Contrastive_loss': 0.1858, 'CrossEntropy_loss': 2.1059, 'epoch': 3.97} {'Total_loss': 2.3064, 'Contrastive_loss': 0.1952, 'CrossEntropy_loss': 2.1112, 'epoch': 3.98} {'eval_loss': 2.4026389122009277, 'eval_bleu': 52.777080738563626, 'eval_runtime': 188.3134, 'eval_samples_per_second': 13.031, 'eval_steps_per_second': 0.653, 'epoch': 3.98} {'Total_loss': 2.2677, 'Contrastive_loss': 0.1903, 'CrossEntropy_loss': 2.0774, 'epoch': 3.99} {'Total_loss': 2.2872, 'Contrastive_loss': 0.1872, 'CrossEntropy_loss': 2.0999, 'epoch': 4.0} {'Total_loss': 2.2791, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0968, 'epoch': 4.01} {'Total_loss': 2.2375, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0549, 'epoch': 4.02} {'Total_loss': 2.2759, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.0903, 'epoch': 4.03} {'Total_loss': 2.2553, 'Contrastive_loss': 0.1855, 'CrossEntropy_loss': 2.0698, 'epoch': 4.04} {'Total_loss': 2.2298, 'Contrastive_loss': 0.1852, 'CrossEntropy_loss': 2.0446, 'epoch': 4.05} {'Total_loss': 2.2814, 'Contrastive_loss': 0.1837, 'CrossEntropy_loss': 2.0977, 'epoch': 4.06} {'Total_loss': 2.2491, 'Contrastive_loss': 0.1842, 'CrossEntropy_loss': 2.0649, 'epoch': 4.07} {'Total_loss': 2.2151, 'Contrastive_loss': 0.1867, 'CrossEntropy_loss': 2.0284, 'epoch': 4.08} {'Total_loss': 2.2506, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0684, 'epoch': 4.09} {'Total_loss': 2.2506, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0687, 'epoch': 4.1} {'Total_loss': 2.2559, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0738, 'epoch': 4.11} {'Total_loss': 2.243, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0603, 'epoch': 4.12} {'Total_loss': 2.2695, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.087, 'epoch': 4.13} {'Total_loss': 2.2509, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0688, 'epoch': 4.14} {'Total_loss': 2.2104, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0282, 'epoch': 4.15} {'Total_loss': 2.2471, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0653, 'epoch': 4.16} {'Total_loss': 2.2263, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0443, 'epoch': 4.17} {'Total_loss': 2.2518, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.0682, 'epoch': 4.18} {'eval_loss': 2.3921515941619873, 'eval_bleu': 53.35485095796362, 'eval_runtime': 189.8349, 'eval_samples_per_second': 12.927, 'eval_steps_per_second': 0.648, 'epoch': 4.18} {'Total_loss': 2.2335, 'Contrastive_loss': 0.1831, 'CrossEntropy_loss': 2.0503, 'epoch': 4.19} {'Total_loss': 2.2166, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.0342, 'epoch': 4.2} {'Total_loss': 2.2239, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0418, 'epoch': 4.21} {'Total_loss': 2.2572, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.075, 'epoch': 4.22} {'Total_loss': 2.2203, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.038, 'epoch': 4.23} {'Total_loss': 2.2432, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.0597, 'epoch': 4.24} {'Total_loss': 2.2335, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0508, 'epoch': 4.25} {'Total_loss': 2.2576, 'Contrastive_loss': 0.1868, 'CrossEntropy_loss': 2.0708, 'epoch': 4.26} {'Total_loss': 2.2499, 'Contrastive_loss': 0.1859, 'CrossEntropy_loss': 2.064, 'epoch': 4.27} {'Total_loss': 2.2562, 'Contrastive_loss': 0.1836, 'CrossEntropy_loss': 2.0727, 'epoch': 4.28} {'Total_loss': 2.2461, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0641, 'epoch': 4.29} {'Total_loss': 2.2782, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0963, 'epoch': 4.3} {'Total_loss': 2.2396, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0578, 'epoch': 4.31} {'Total_loss': 2.2235, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0403, 'epoch': 4.32} {'Total_loss': 2.2181, 'Contrastive_loss': 0.1886, 'CrossEntropy_loss': 2.0295, 'epoch': 4.33} {'Total_loss': 2.27, 'Contrastive_loss': 0.1834, 'CrossEntropy_loss': 2.0867, 'epoch': 4.34} {'Total_loss': 2.2445, 'Contrastive_loss': 0.1828, 'CrossEntropy_loss': 2.0617, 'epoch': 4.35} {'Total_loss': 2.2624, 'Contrastive_loss': 0.1844, 'CrossEntropy_loss': 2.0779, 'epoch': 4.36} {'Total_loss': 2.2433, 'Contrastive_loss': 0.1846, 'CrossEntropy_loss': 2.0588, 'epoch': 4.37} {'Total_loss': 2.2099, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0275, 'epoch': 4.38} {'eval_loss': 2.3844974040985107, 'eval_bleu': 54.22287598491524, 'eval_runtime': 198.6886, 'eval_samples_per_second': 12.351, 'eval_steps_per_second': 0.619, 'epoch': 4.38} {'Total_loss': 2.2639, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.082, 'epoch': 4.39} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0463, 'epoch': 4.4} {'Total_loss': 2.2324, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0502, 'epoch': 4.41} {'Total_loss': 2.2365, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0543, 'epoch': 4.42} {'Total_loss': 2.2404, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0581, 'epoch': 4.43} {'Total_loss': 2.2419, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.0595, 'epoch': 4.44} {'Total_loss': 2.2452, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0633, 'epoch': 4.45} {'Total_loss': 2.2395, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0571, 'epoch': 4.46} {'Total_loss': 2.2197, 'Contrastive_loss': 0.1835, 'CrossEntropy_loss': 2.0362, 'epoch': 4.46} {'Total_loss': 2.2593, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0773, 'epoch': 4.47} {'Total_loss': 2.2175, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0351, 'epoch': 4.48} {'Total_loss': 2.2446, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0623, 'epoch': 4.49} {'Total_loss': 2.2521, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0699, 'epoch': 4.5} {'Total_loss': 2.2398, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0575, 'epoch': 4.51} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0463, 'epoch': 4.52} {'Total_loss': 2.2386, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0566, 'epoch': 4.53} {'Total_loss': 2.2279, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.0455, 'epoch': 4.54} {'Total_loss': 2.2116, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0298, 'epoch': 4.55} {'Total_loss': 2.2292, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0474, 'epoch': 4.56} {'Total_loss': 2.2292, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0475, 'epoch': 4.57} {'eval_loss': 2.3810110092163086, 'eval_bleu': 53.01764696584839, 'eval_runtime': 198.5997, 'eval_samples_per_second': 12.357, 'eval_steps_per_second': 0.619, 'epoch': 4.57} {'Total_loss': 2.2307, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0487, 'epoch': 4.58} {'Total_loss': 2.2477, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0656, 'epoch': 4.59} {'Total_loss': 2.2529, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.071, 'epoch': 4.6} {'Total_loss': 2.223, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.041, 'epoch': 4.61} {'Total_loss': 2.2283, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.046, 'epoch': 4.62} {'Total_loss': 2.2502, 'Contrastive_loss': 0.1833, 'CrossEntropy_loss': 2.067, 'epoch': 4.63} {'Total_loss': 2.2454, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0633, 'epoch': 4.64} {'Total_loss': 2.248, 'Contrastive_loss': 0.1822, 'CrossEntropy_loss': 2.0658, 'epoch': 4.65} {'Total_loss': 2.2404, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0581, 'epoch': 4.66} {'Total_loss': 2.2822, 'Contrastive_loss': 0.1825, 'CrossEntropy_loss': 2.0997, 'epoch': 4.67} {'Total_loss': 2.2554, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0733, 'epoch': 4.68} {'Total_loss': 2.2478, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0657, 'epoch': 4.69} {'Total_loss': 2.2336, 'Contrastive_loss': 0.184, 'CrossEntropy_loss': 2.0495, 'epoch': 4.7} {'Total_loss': 2.2658, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0835, 'epoch': 4.71} {'Total_loss': 2.208, 'Contrastive_loss': 0.1829, 'CrossEntropy_loss': 2.0251, 'epoch': 4.72} {'Total_loss': 2.2101, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0282, 'epoch': 4.73} {'Total_loss': 2.2359, 'Contrastive_loss': 0.1832, 'CrossEntropy_loss': 2.0527, 'epoch': 4.74} {'Total_loss': 2.2328, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0505, 'epoch': 4.75} {'Total_loss': 2.2259, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0438, 'epoch': 4.76} {'Total_loss': 2.2624, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0803, 'epoch': 4.77} {'eval_loss': 2.366802453994751, 'eval_bleu': 54.010005283325334, 'eval_runtime': 197.2867, 'eval_samples_per_second': 12.439, 'eval_steps_per_second': 0.623, 'epoch': 4.77} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0462, 'epoch': 4.78} {'Total_loss': 2.2649, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0829, 'epoch': 4.79} {'Total_loss': 2.2718, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.09, 'epoch': 4.8} {'Total_loss': 2.2471, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0653, 'epoch': 4.81} {'Total_loss': 2.2084, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0263, 'epoch': 4.82} {'Total_loss': 2.2395, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0576, 'epoch': 4.83} {'Total_loss': 2.2409, 'Contrastive_loss': 0.1826, 'CrossEntropy_loss': 2.0584, 'epoch': 4.84} {'Total_loss': 2.2473, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0655, 'epoch': 4.85} {'Total_loss': 2.2447, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0628, 'epoch': 4.86} {'Total_loss': 2.2414, 'Contrastive_loss': 0.1824, 'CrossEntropy_loss': 2.059, 'epoch': 4.87} {'Total_loss': 2.2668, 'Contrastive_loss': 0.183, 'CrossEntropy_loss': 2.0838, 'epoch': 4.88} {'Total_loss': 2.2502, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0682, 'epoch': 4.89} {'Total_loss': 2.2494, 'Contrastive_loss': 0.1818, 'CrossEntropy_loss': 2.0675, 'epoch': 4.9} {'Total_loss': 2.2535, 'Contrastive_loss': 0.1821, 'CrossEntropy_loss': 2.0714, 'epoch': 4.91} {'Total_loss': 2.2313, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0491, 'epoch': 4.92} {'Total_loss': 2.2278, 'Contrastive_loss': 0.182, 'CrossEntropy_loss': 2.0458, 'epoch': 4.93} {'Total_loss': 2.2281, 'Contrastive_loss': 0.1823, 'CrossEntropy_loss': 2.0458, 'epoch': 4.94} {'Total_loss': 2.24, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0581, 'epoch': 4.95} {'Total_loss': 2.2499, 'Contrastive_loss': 0.1827, 'CrossEntropy_loss': 2.0672, 'epoch': 4.96} {'Total_loss': 2.2593, 'Contrastive_loss': 0.1819, 'CrossEntropy_loss': 2.0774, 'epoch': 4.97} {'eval_loss': 2.3611326217651367, 'eval_bleu': 54.36605619633802, 'eval_runtime': 200.5003, 'eval_samples_per_second': 12.239, 'eval_steps_per_second': 0.613, 'epoch': 4.97} {'train_runtime': 21247.9554, 'train_samples_per_second': 23531.676, 'train_steps_per_second': 2.353, 'train_loss': 2.657199195404053, 'epoch': 4.97} ***** train metrics ***** epoch = 4.97 train_loss = 2.6572 train_runtime = 5:54:07.95 train_samples = 663486 train_samples_per_second = 23531.676 train_steps_per_second = 2.353 ***** predict metrics ***** predict_bleu = 54.0698 predict_loss = 2.407 predict_runtime = 0:03:10.38 predict_samples = 2483 predict_samples_per_second = 13.042 predict_steps_per_second = 0.657 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

vhientran commented 1 year ago

Thank you for your quick reply! I will try it. Many thanks!

vodiepnhu commented 1 year ago

Thank you for your quick reply! I will try it. Many thanks! Hi, can you tell me how to get the src.vocab file? I didn't find it in the ende folder when I downloaded it. Thank you

Hannibal046 commented 1 year ago

Thank you for your quick reply! I will try it. Many thanks! Hi, can you tell me how to get the src.vocab file? I didn't find it in the ende folder when I downloaded it. Thank you

Hi, thanks for the interest. The src.vocab should be within the provided ready-to-go data folder. Or you could simply using joint-bpe(https://github.com/rsennrich/subword-nmt) to generate it.

vodiepnhu commented 1 year ago

Hi, thanks for the interest. The src.vocab should be within the provided ready-to-go data folder. Or you could simply using joint-bpe(https://github.com/rsennrich/subword-nmt) to generate it.

Thank you so much.