Closed ShaobinChen-AH closed 2 years ago
Hi, That's weird. are you using the Transformers version from the repository? Are you sure the multi-dropout is activated? It is a bit hard to make a remote diagnosis. Maybe you want to share something like wandb log charts?
Hello! I exactly followed the steps listed in GitHub to conduct experiments and did not change any code. The output.log file is in appendix. Please check it. Thanks a lot! | chenshaobin000001 | |
---|---|---|
@. | 签名由网易邮箱大师定制 On 6/17/2022 04:23,Tassilo @.> wrote:
Hi, That's weird. are you using the Transformers version from the repository? Are you sure the multi-dropout is activated? It is a bit hard to make a remote diagnosis. Maybe you want to share something like wandb log charts?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***> {'eval_maxavg_sts': 0.7558029484954379, 'eval_maxsickr_spearman': 0.7158060157680484, 'eval_maxstsb_spearman': 0.7957998812228275, 'eval_stsb_spearman': 0.7957998812228275, 'eval_sickr_spearman': 0.7158060157680484, 'eval_avg_sts': 0.7558029484954379, 'epoch': 0.05} {'loss': 386.6133, 'learning_rate': 2.712036859282012e-05, 'epoch': 0.1} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7291345548217202, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7980554929174318, 'eval_sickr_spearman': 0.7291345548217202, 'eval_avg_sts': 0.763595023869576, 'epoch': 0.1} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7317741475655546, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7879672784141661, 'eval_sickr_spearman': 0.7317741475655546, 'eval_avg_sts': 0.7598707129898603, 'epoch': 0.14} {'loss': -1446.9136, 'learning_rate': 2.4240737185640238e-05, 'epoch': 0.19} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7789952271954208, 'eval_sickr_spearman': 0.7363226384969059, 'eval_avg_sts': 0.7576589328461634, 'epoch': 0.19} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7591702859373091, 'eval_sickr_spearman': 0.6776197531295027, 'eval_avg_sts': 0.7183950195334059, 'epoch': 0.24} {'loss': 306.449, 'learning_rate': 2.1361105778460356e-05, 'epoch': 0.29} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7323188396618904, 'eval_sickr_spearman': 0.7096353160729898, 'eval_avg_sts': 0.72097707786744, 'epoch': 0.29} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.6647924483963804, 'eval_sickr_spearman': 0.7085503415226041, 'eval_avg_sts': 0.6866713949594923, 'epoch': 0.34} {'loss': 149.5543, 'learning_rate': 1.848147437128048e-05, 'epoch': 0.38} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7429431485772425, 'eval_sickr_spearman': 0.6884636868700841, 'eval_avg_sts': 0.7157034177236633, 'epoch': 0.38} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.6634105693058654, 'eval_sickr_spearman': 0.6755006695702978, 'eval_avg_sts': 0.6694556194380816, 'epoch': 0.43} {'loss': -1874.4829, 'learning_rate': 1.5601842964100597e-05, 'epoch': 0.48} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.6363257572280788, 'eval_sickr_spearman': 0.6683948517561974, 'eval_avg_sts': 0.6523603044921381, 'epoch': 0.48} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.579264899052929, 'eval_sickr_spearman': 0.648998931070469, 'eval_avg_sts': 0.6141319150616991, 'epoch': 0.53} {'loss': -2179.0642, 'learning_rate': 1.2722211556920715e-05, 'epoch': 0.58} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5650398115623568, 'eval_sickr_spearman': 0.6456787003786512, 'eval_avg_sts': 0.605359255970504, 'epoch': 0.58} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5747575057332082, 'eval_sickr_spearman': 0.6500893968471415, 'eval_avg_sts': 0.6124234512901748, 'epoch': 0.62} {'loss': -2204.1105, 'learning_rate': 9.842580149740832e-06, 'epoch': 0.67} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.591853399919322, 'eval_sickr_spearman': 0.6582546046114564, 'eval_avg_sts': 0.6250540022653892, 'epoch': 0.67} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.577838394397483, 'eval_sickr_spearman': 0.6212442564696482, 'eval_avg_sts': 0.5995413254335655, 'epoch': 0.72} {'loss': -2216.6462, 'learning_rate': 6.962948742560952e-06, 'epoch': 0.77} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5772755909276925, 'eval_sickr_spearman': 0.6246517718740814, 'eval_avg_sts': 0.600963681400887, 'epoch': 0.77} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5811801836817388, 'eval_sickr_spearman': 0.6215665477019344, 'eval_avg_sts': 0.6013733656918366, 'epoch': 0.82} {'loss': -2429.1025, 'learning_rate': 4.083317335381071e-06, 'epoch': 0.86} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5709409375881458, 'eval_sickr_spearman': 0.6154062700166515, 'eval_avg_sts': 0.5931736038023987, 'epoch': 0.86} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5725367020681141, 'eval_sickr_spearman': 0.6104197525136144, 'eval_avg_sts': 0.5914782272908643, 'epoch': 0.91} {'loss': -2483.2812, 'learning_rate': 1.2036859282011902e-06, 'epoch': 0.96} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5718134068124486, 'eval_sickr_spearman': 0.6115689354466894, 'eval_avg_sts': 0.591691171129569, 'epoch': 0.96} {'train_runtime': 1899.7417, 'train_samples_per_second': 2.742, 'epoch': 1.0} 06/14/2022 13:35:43 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1 distributed training: False, 16-bits training: True 06/14/2022 13:35:43 - INFO - main - Training/evaluation parameters OurTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_steps=250, eval_transfer=False, evaluation_strategy=IntervalStrategy.STEPS, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gradient_accumulation_steps=1, greater_is_better=True, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=3e-05, length_column_name=length, load_best_model_at_end=True, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=result/runs/Jun14_13-35-37_ICA-702A, logging_first_step=False, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=sickr_spearman, mp_parameters=, no_cuda=False, num_train_epochs=1.0, number_of_steps=None, output_dir=result/scd-bert-base2_sum, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=192, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=result, push_to_hub_organization=None, push_to_hub_token=None, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=result, save_on_each_node=False, save_steps=500, save_strategy=IntervalStrategy.STEPS, save_total_limit=0, seed=42, sharded_ddp=[], skip_memory_metrics=True, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 06/14/2022 13:35:53 - WARNING - datasets.builder - Using custom data configuration default-d3d71086dc371752 06/14/2022 13:35:53 - WARNING - datasets.builder - Reusing dataset text (./data/text/default-d3d71086dc371752/0.0.0/4b86d314f7236db91f0a0f5cda32d4375445e64c5eda2692655dd99c2dac68e8) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 317.97it/s] [INFO|configuration_utils.py:561] 2022-06-14 13:35:55,066 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/ica-702a/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e [INFO|configuration_utils.py:598] 2022-06-14 13:35:55,067 >> Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } [INFO|configuration_utils.py:561] 2022-06-14 13:35:57,270 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/ica-702a/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e [INFO|configuration_utils.py:598] 2022-06-14 13:35:57,270 >> Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,856 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/ica-702a/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/ica-702a/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/ica-702a/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|configuration_utils.py:561] 2022-06-14 13:36:04,932 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/ica-702a/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e [INFO|configuration_utils.py:598] 2022-06-14 13:36:04,932 >> Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } [INFO|modeling_utils.py:1279] 2022-06-14 13:36:06,067 >> loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/ica-702a/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f [WARNING|modeling_utils.py:1516] 2022-06-14 13:36:07,543 >> Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForCL: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias']
lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
5%|█████▌ | 250/5209 [01:28<25:35, 3.23it/s][INFO|trainer.py:1935] 2022-06-14 13:37:37,459 >> Saving model checkpoint to result/scd-bert-base2_sum
[INFO|configuration_utils.py:391] 2022-06-14 13:37:37,459 >> Configuration saved in result/scd-bert-base2_sum/config.json
[INFO|modeling_utils.py:1001] 2022-06-14 13:37:38,319 >> Model weights saved in result/scd-bert-base2_sum/pytorch_model.bin
[INFO|tokenization_utils_base.py:2020] 2022-06-14 13:37:38,320 >> tokenizer config file saved in result/scd-bert-base2_sum/tokenizer_config.json
[INFO|tokenization_utils_base.py:2026] 2022-06-14 13:37:38,320 >> Special tokens file saved in result/scd-bert-base2_sum/special_tokens_map.json
10%|███████████▏ | 500/5209 [02:59<27:04, 2.90it/s][INFO|trainer.py:1935] 2022-06-14 13:39:09,172 >> Saving model checkpoint to result/scd-bert-base2_sum
[INFO|configuration_utils.py:391] 2022-06-14 13:39:09,173 >> Configuration saved in result/scd-bert-base2_sum/config.json
[INFO|modeling_utils.py:1001] 2022-06-14 13:39:13,271 >> Model weights saved in result/scd-bert-base2_sum/pytorch_model.bin
[INFO|tokenization_utils_base.py:2020] 2022-06-14 13:39:13,271 >> tokenizer config file saved in result/scd-bert-base2_sum/tokenizer_config.json
[INFO|tokenization_utils_base.py:2026] 2022-06-14 13:39:13,272 >> Special tokens file saved in result/scd-bert-base2_sum/special_tokens_map.json
14%|████████████████▊ | 750/5209 [04:41<23:20, 3.18it/s][INFO|trainer.py:1935] 2022-06-14 13:40:50,859 >> Saving model checkpoint to result/scd-bert-base2_sum
[INFO|configuration_utils.py:391] 2022-06-14 13:40:50,860 >> Configuration saved in result/scd-bert-base2_sum/config.json
[INFO|modeling_utils.py:1001] 2022-06-14 13:40:54,901 >> Model weights saved in result/scd-bert-base2_sum/pytorch_model.bin
[INFO|tokenization_utils_base.py:2020] 2022-06-14 13:40:54,902 >> tokenizer config file saved in result/scd-bert-base2_sum/tokenizer_config.json
[INFO|tokenization_utils_base.py:2026] 2022-06-14 13:40:54,902 >> Special tokens file saved in result/scd-bert-base2_sum/special_tokens_map.json
19%|██████████████████████▎ | 1000/5209 [06:23<24:15, 2.89it/s][INFO|trainer.py:1935] 2022-06-14 13:42:32,340 >> Saving model checkpoint to result/scd-bert-base2_sum
[INFO|configuration_utils.py:391] 2022-06-14 13:42:32,341 >> Configuration saved in result/scd-bert-base2_sum/config.json
[INFO|modeling_utils.py:1001] 2022-06-14 13:42:36,402 >> Model weights saved in result/scd-bert-base2_sum/pytorch_model.bin
[INFO|tokenization_utils_base.py:2020] 2022-06-14 13:42:36,403 >> tokenizer config file saved in result/scd-bert-base2_sum/tokenizer_config.json
[INFO|tokenization_utils_base.py:2026] 2022-06-14 13:42:36,403 >> Special tokens file saved in result/scd-bert-base2_sum/special_tokens_map.json
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5209/5209 [31:38<00:00, 3.76it/s]06/14/2022 14:07:47 - INFO - scd.trainers -
Training completed. Do not forget to share your model on huggingface.co/models =)
06/14/2022 14:07:47 - INFO - scd.trainers - Loading best model from result/scd-bert-base2_sum (score: 0.7363226384969059).
[INFO|configuration_utils.py:559] 2022-06-14 14:07:47,912 >> loading configuration file result/scd-bert-base2_sum/config.json
[INFO|configuration_utils.py:598] 2022-06-14 14:07:47,912 >> Model config BertConfig {
"_name_or_path": "bert-base-uncased",
"architectures": [
"BertForCL"
],
"attention_probs_dropout_prob": 0.05,
"attention_probs_dropout_prob_noise": 0.155,
"classifier_dropout": null,
"embedding_dim": 768,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.05,
"hidden_dropout_prob_noise": 0.155,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"multi_dropout": true,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"projector": "4096-4096-4096",
"task_alpha": 1.0,
"task_beta": 1.05,
"task_lambda": 0.013,
"torch_dtype": "float32",
"transformers_version": "4.10.0",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
[INFO|modeling_utils.py:1277] 2022-06-14 14:07:47,912 >> loading weights file result/scd-bert-base2_sum/pytorch_model.bin
[INFO|modeling_utils.py:1524] 2022-06-14 14:07:48,932 >> All model checkpoint weights were used when initializing BertForCL.
[INFO|modeling_utils.py:1533] 2022-06-14 14:07:48,933 >> All the weights of BertForCL were initialized from the model checkpoint at result/scd-bert-base2_sum.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForCL for predictions without further training.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5209/5209 [31:39<00:00, 2.74it/s]
[INFO|trainer.py:1935] 2022-06-14 14:07:49,039 >> Saving model checkpoint to result/scd-bert-base2_sum
[INFO|configuration_utils.py:391] 2022-06-14 14:07:49,040 >> Configuration saved in result/scd-bert-base2_sum/config.json
[INFO|modeling_utils.py:1001] 2022-06-14 14:07:52,975 >> Model weights saved in result/scd-bert-base2_sum/pytorch_model.bin
[INFO|tokenization_utils_base.py:2020] 2022-06-14 14:07:53,006 >> tokenizer config file saved in result/scd-bert-base2_sum/tokenizer_config.json
[INFO|tokenization_utils_base.py:2026] 2022-06-14 14:07:53,007 >> Special tokens file saved in result/scd-bert-base2_sum/special_tokens_map.json
06/14/2022 14:07:53 - INFO - main - Train results
06/14/2022 14:07:53 - INFO - main - epoch = 1.0
06/14/2022 14:07:53 - INFO - main - train_runtime = 1899.7417
06/14/2022 14:07:53 - INFO - main - train_samples_per_second = 2.742
06/14/2022 14:07:53 - INFO - main - Evaluate
06/14/2022 14:08:03 - INFO - main - Eval results
06/14/2022 14:08:03 - INFO - main - epoch = 1.0
06/14/2022 14:08:03 - INFO - main - eval_avg_sts = 0.7576589328461634
06/14/2022 14:08:03 - INFO - main - eval_maxavg_sts = 0.763595023869576
06/14/2022 14:08:03 - INFO - main - eval_maxsickr_spearman = 0.7363226384969059
06/14/2022 14:08:03 - INFO - main - eval_maxstsb_spearman = 0.7980554929174318
06/14/2022 14:08:03 - INFO - main - eval_sickr_spearman = 0.7363226384969059
06/14/2022 14:08:03 - INFO - main - eval_stsb_spearman = 0.7789952271954208Like I mentioned, having a wandb chart rather than the log output would be helpful.
I exactly followed the steps listed in GitHub to conduct experiments and did not change any code.
But I mentioned in your previous issue that taking the parameter task_beta=1.05 entails using the sum() rather than the mean() for BERT.
Hello! I tried to reproduce the results listed in the paper but failed.... I use the hyper-parameters listed in config.json in huggingface model repository, but there is a large gap between the results I got and the results in the paper. For example, In terms of average sts score: BERT: lower than 72 (I got) vs. 74.19 (in the paper) RoBERTa: about 70 (I got) vs. 73.89 (in the paper) I also tried to reproduce the results using the hyper-parameters adopted in your paper but there is a large gap too..... Could you give me some advice? Thanks in advance.