UKPLab / useb

Heterogenous, Task- and Domain-Specific Benchmark for Unsupervised Sentence Embeddings used in the TSDAE paper: https://arxiv.org/abs/2104.06979.
Apache License 2.0
32 stars 2 forks source link

SimCSE supervised #2

Open Muennighoff opened 2 years ago

Muennighoff commented 2 years ago

Did you try SimCSE's supervised training objective in-domain on USEB? Would be interesting to compare to SBERT-supervised...!

kwang2049 commented 2 years ago

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

Muennighoff commented 2 years ago

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

Nice thanks for this! I hadn't realized SimCSE's supervised objective was equivalent to SBERT-base-nli-stsb-v2's objective And it seems it's also equivalent to Contrastive Multiview Coding (https://arxiv.org/pdf/1906.05849.pdf) except they optionally take hard negatives from anywhere via a memory buffer, not just the current batch~

So ignoring that all the below are the same

SBERT with MultipleNegativesRankingLoss

Screenshot 2022-01-03 at 18 52 04

SimCSE Supervised image

Multiview Contrastive Coding

Screenshot 2022-01-03 at 18 51 12
Muennighoff commented 2 years ago

Could you provide the training code for SBERT-supervised? (I.e. the training on USEB)

kwang2049 commented 2 years ago

Hi @Muennighoff,

For hyper-parameters, I trained all these SBERT-supervised models for 10 epochs, with 0.1 * #total steps of linear warmup and early-stopping on the dev score if possible. All the other hyper-parameters are used as the default setting in SentenceTransformer.fit.

If you have further questions about this, I can give you more hints:)