UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.21k stars 2.47k forks source link

About the experimental configuration of TSDAE #1338

Open chaochen99 opened 2 years ago

chaochen99 commented 2 years ago

Hi, I want to reproduce the TSDAE experiment, but the results obtained are lower than those in your paper(58.5<59.4). Could you reply to the following questions?

  1. What is the type of GPU used in the experiment?
  2. What are the 5 random seeds?
  3. What are the versions of torch and other environments that may affect the result?
  4. When I set transformers==3.1.0 consistent with the paper, the following error will occur: Traceback (most recent call last): File "train_askubuntu_tsdae.py", line 85, in train_loss = losses.DenoisingAutoEncoderLoss(model, decoder_name_or_path=model_name, tie_encoder_decoder=True) File "sentence-transformers/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 38, in init encoder_name_or_path = model[0].auto_model.config._name_or_path AttributeError: 'BertConfig' object has no attribute '_name_or_path'

Looking forward to your reply. Thank you.

nreimers commented 2 years ago

@kwang2049 Could you help here?

kwang2049 commented 2 years ago

Hi @chaochen99,

Thanks for your attention!

I think you can just try the latest sentence-transformers via pip install -U sentence-transformers and run the example. I have just run it again and find the evaluation result on AskUbuntu is 59.49 (MAP). There are many incompatible issues with previous versions of transformers.

According to your questions:

  1. I used Tesla V100;
  2. The actual values of the seeds do not really matter, since it will give you different random series on different machines even for the same seed;
  3. Sorry I did not record the version of Pytorch;
  4. That issue was due to the old version of Huggingface's transformers.

And if you want to have the exact checkpoints I trained during the experiments, please refer to here.

chaochen99 commented 2 years ago

Hi @kwang2049,

Thanks for your detailed reply!

In order to better follow your work, could you share the evaluation code for other datasets?

Thanks again.

kwang2049 commented 2 years ago

Hi @chaochen99,

Sure, the evaluation code for all the datasets is available at https://github.com/UKPLab/useb