SAP-samples / acl2022-self-contrastive-decorrelation

Source code for ACL 2022 paper "Self-contrastive Decorrelation for Sentence Embeddings".
Apache License 2.0
24 stars 7 forks source link

Unable to reproduce results #5

Closed ShaobinChen-AH closed 2 years ago

ShaobinChen-AH commented 2 years ago

Hello! I tried to reproduce the results listed in the paper but failed.... I use the hyper-parameters listed in config.json in huggingface model repository, but there is a large gap between the results I got and the results in the paper. For example, In terms of average sts score: BERT: lower than 72 (I got) vs. 74.19 (in the paper) RoBERTa: about 70 (I got) vs. 73.89 (in the paper) I also tried to reproduce the results using the hyper-parameters adopted in your paper but there is a large gap too..... Could you give me some advice? Thanks in advance.

TJKlein commented 2 years ago

Hi, That's weird. are you using the Transformers version from the repository? Are you sure the multi-dropout is activated? It is a bit hard to make a remote diagnosis. Maybe you want to share something like wandb log charts?

ShaobinChen-AH commented 2 years ago
Hello! I exactly followed the steps listed in GitHub to conduct experiments and did not change any code. The output.log file is in appendix. Please check it. Thanks a lot! chenshaobin000001

@. | 签名由网易邮箱大师定制 On 6/17/2022 04:23,Tassilo @.> wrote:

Hi, That's weird. are you using the Transformers version from the repository? Are you sure the multi-dropout is activated? It is a bit hard to make a remote diagnosis. Maybe you want to share something like wandb log charts?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***> {'eval_maxavg_sts': 0.7558029484954379, 'eval_maxsickr_spearman': 0.7158060157680484, 'eval_maxstsb_spearman': 0.7957998812228275, 'eval_stsb_spearman': 0.7957998812228275, 'eval_sickr_spearman': 0.7158060157680484, 'eval_avg_sts': 0.7558029484954379, 'epoch': 0.05} {'loss': 386.6133, 'learning_rate': 2.712036859282012e-05, 'epoch': 0.1} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7291345548217202, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7980554929174318, 'eval_sickr_spearman': 0.7291345548217202, 'eval_avg_sts': 0.763595023869576, 'epoch': 0.1} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7317741475655546, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7879672784141661, 'eval_sickr_spearman': 0.7317741475655546, 'eval_avg_sts': 0.7598707129898603, 'epoch': 0.14} {'loss': -1446.9136, 'learning_rate': 2.4240737185640238e-05, 'epoch': 0.19} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7789952271954208, 'eval_sickr_spearman': 0.7363226384969059, 'eval_avg_sts': 0.7576589328461634, 'epoch': 0.19} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7591702859373091, 'eval_sickr_spearman': 0.6776197531295027, 'eval_avg_sts': 0.7183950195334059, 'epoch': 0.24} {'loss': 306.449, 'learning_rate': 2.1361105778460356e-05, 'epoch': 0.29} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7323188396618904, 'eval_sickr_spearman': 0.7096353160729898, 'eval_avg_sts': 0.72097707786744, 'epoch': 0.29} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.6647924483963804, 'eval_sickr_spearman': 0.7085503415226041, 'eval_avg_sts': 0.6866713949594923, 'epoch': 0.34} {'loss': 149.5543, 'learning_rate': 1.848147437128048e-05, 'epoch': 0.38} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.7429431485772425, 'eval_sickr_spearman': 0.6884636868700841, 'eval_avg_sts': 0.7157034177236633, 'epoch': 0.38} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.6634105693058654, 'eval_sickr_spearman': 0.6755006695702978, 'eval_avg_sts': 0.6694556194380816, 'epoch': 0.43} {'loss': -1874.4829, 'learning_rate': 1.5601842964100597e-05, 'epoch': 0.48} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.6363257572280788, 'eval_sickr_spearman': 0.6683948517561974, 'eval_avg_sts': 0.6523603044921381, 'epoch': 0.48} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.579264899052929, 'eval_sickr_spearman': 0.648998931070469, 'eval_avg_sts': 0.6141319150616991, 'epoch': 0.53} {'loss': -2179.0642, 'learning_rate': 1.2722211556920715e-05, 'epoch': 0.58} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5650398115623568, 'eval_sickr_spearman': 0.6456787003786512, 'eval_avg_sts': 0.605359255970504, 'epoch': 0.58} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5747575057332082, 'eval_sickr_spearman': 0.6500893968471415, 'eval_avg_sts': 0.6124234512901748, 'epoch': 0.62} {'loss': -2204.1105, 'learning_rate': 9.842580149740832e-06, 'epoch': 0.67} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.591853399919322, 'eval_sickr_spearman': 0.6582546046114564, 'eval_avg_sts': 0.6250540022653892, 'epoch': 0.67} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.577838394397483, 'eval_sickr_spearman': 0.6212442564696482, 'eval_avg_sts': 0.5995413254335655, 'epoch': 0.72} {'loss': -2216.6462, 'learning_rate': 6.962948742560952e-06, 'epoch': 0.77} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5772755909276925, 'eval_sickr_spearman': 0.6246517718740814, 'eval_avg_sts': 0.600963681400887, 'epoch': 0.77} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5811801836817388, 'eval_sickr_spearman': 0.6215665477019344, 'eval_avg_sts': 0.6013733656918366, 'epoch': 0.82} {'loss': -2429.1025, 'learning_rate': 4.083317335381071e-06, 'epoch': 0.86} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5709409375881458, 'eval_sickr_spearman': 0.6154062700166515, 'eval_avg_sts': 0.5931736038023987, 'epoch': 0.86} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5725367020681141, 'eval_sickr_spearman': 0.6104197525136144, 'eval_avg_sts': 0.5914782272908643, 'epoch': 0.91} {'loss': -2483.2812, 'learning_rate': 1.2036859282011902e-06, 'epoch': 0.96} {'eval_maxavg_sts': 0.763595023869576, 'eval_maxsickr_spearman': 0.7363226384969059, 'eval_maxstsb_spearman': 0.7980554929174318, 'eval_stsb_spearman': 0.5718134068124486, 'eval_sickr_spearman': 0.6115689354466894, 'eval_avg_sts': 0.591691171129569, 'epoch': 0.96} {'train_runtime': 1899.7417, 'train_samples_per_second': 2.742, 'epoch': 1.0} 06/14/2022 13:35:43 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1 distributed training: False, 16-bits training: True 06/14/2022 13:35:43 - INFO - main - Training/evaluation parameters OurTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_steps=250, eval_transfer=False, evaluation_strategy=IntervalStrategy.STEPS, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gradient_accumulation_steps=1, greater_is_better=True, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=3e-05, length_column_name=length, load_best_model_at_end=True, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=result/runs/Jun14_13-35-37_ICA-702A, logging_first_step=False, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=sickr_spearman, mp_parameters=, no_cuda=False, num_train_epochs=1.0, number_of_steps=None, output_dir=result/scd-bert-base2_sum, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=192, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=result, push_to_hub_organization=None, push_to_hub_token=None, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=result, save_on_each_node=False, save_steps=500, save_strategy=IntervalStrategy.STEPS, save_total_limit=0, seed=42, sharded_ddp=[], skip_memory_metrics=True, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 06/14/2022 13:35:53 - WARNING - datasets.builder - Using custom data configuration default-d3d71086dc371752 06/14/2022 13:35:53 - WARNING - datasets.builder - Reusing dataset text (./data/text/default-d3d71086dc371752/0.0.0/4b86d314f7236db91f0a0f5cda32d4375445e64c5eda2692655dd99c2dac68e8) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 317.97it/s] [INFO|configuration_utils.py:561] 2022-06-14 13:35:55,066 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/ica-702a/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e [INFO|configuration_utils.py:598] 2022-06-14 13:35:55,067 >> Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } [INFO|configuration_utils.py:561] 2022-06-14 13:35:57,270 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/ica-702a/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e [INFO|configuration_utils.py:598] 2022-06-14 13:35:57,270 >> Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,856 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/ica-702a/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/ica-702a/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1739] 2022-06-14 13:36:03,857 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/ica-702a/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|configuration_utils.py:561] 2022-06-14 13:36:04,932 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/ica-702a/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e [INFO|configuration_utils.py:598] 2022-06-14 13:36:04,932 >> Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } [INFO|modeling_utils.py:1279] 2022-06-14 13:36:06,067 >> loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/ica-702a/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f [WARNING|modeling_utils.py:1516] 2022-06-14 13:36:07,543 >> Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForCL: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias']

TJKlein commented 2 years ago

Like I mentioned, having a wandb chart rather than the log output would be helpful.

I exactly followed the steps listed in GitHub to conduct experiments and did not change any code.

But I mentioned in your previous issue that taking the parameter task_beta=1.05 entails using the sum() rather than the mean() for BERT.