About the experimental configuration of TSDAE

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

15.21k stars 2.47k forks source link

About the experimental configuration of TSDAE #1338

Open chaochen99 opened 2 years ago

chaochen99 commented 2 years ago

Hi, I want to reproduce the TSDAE experiment, but the results obtained are lower than those in your paper（58.5<59.4）. Could you reply to the following questions?

What is the type of GPU used in the experiment?
What are the 5 random seeds?
What are the versions of torch and other environments that may affect the result?
When I set transformers==3.1.0 consistent with the paper, the following error will occur: Traceback (most recent call last): File "train_askubuntu_tsdae.py", line 85, in train_loss = losses.DenoisingAutoEncoderLoss(model, decoder_name_or_path=model_name, tie_encoder_decoder=True) File "sentence-transformers/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 38, in init encoder_name_or_path = model[0].auto_model.config._name_or_path AttributeError: 'BertConfig' object has no attribute '_name_or_path'

Looking forward to your reply. Thank you.

nreimers commented 2 years ago

@kwang2049 Could you help here?

kwang2049 commented 2 years ago

Hi @chaochen99,

Thanks for your attention!

I think you can just try the latest sentence-transformers via pip install -U sentence-transformers and run the example. I have just run it again and find the evaluation result on AskUbuntu is 59.49 (MAP). There are many incompatible issues with previous versions of transformers.

According to your questions:

I used Tesla V100;
The actual values of the seeds do not really matter, since it will give you different random series on different machines even for the same seed;
Sorry I did not record the version of Pytorch;
That issue was due to the old version of Huggingface's transformers.

And if you want to have the exact checkpoints I trained during the experiments, please refer to here.

chaochen99 commented 2 years ago

Hi @kwang2049,

Thanks for your detailed reply!

In order to better follow your work, could you share the evaluation code for other datasets？

Thanks again.

kwang2049 commented 2 years ago

Hi @chaochen99,

Sure, the evaluation code for all the datasets is available at https://github.com/UKPLab/useb