SimCSE supervised - Githubissues

Muennighoff commented 2 years ago

Did you try SimCSE's supervised training objective in-domain on USEB? Would be interesting to compare to SBERT-supervised...!

kwang2049 commented 2 years ago

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

Muennighoff commented 2 years ago

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

Nice thanks for this! I hadn't realized SimCSE's supervised objective was equivalent to SBERT-base-nli-stsb-v2's objective And it seems it's also equivalent to Contrastive Multiview Coding (https://arxiv.org/pdf/1906.05849.pdf) except they optionally take hard negatives from anywhere via a memory buffer, not just the current batch~

So ignoring that all the below are the same

SBERT with MultipleNegativesRankingLoss

SimCSE Supervised

Multiview Contrastive Coding

Muennighoff commented 2 years ago

Could you provide the training code for SBERT-supervised? (I.e. the training on USEB)

kwang2049 commented 2 years ago

Hi @Muennighoff,

For AskUbuntu, CQADupStack and SciDocs (since all of them have only binary labels), one can follow the SBERT example examples/training/nli/training_nli_v2.py (with modification at line 79 to load each pair of gold paraphrase in USEB) and the labeled data can be loaded from data-train/${dataset_name}/supervised/train.org and train.para (each two parallel lines are corresponding to one pair of gold paraphrase);
For Twitter (since it has fine-grained labels), one can follow the SBERT example examples/training/sts/training_stsbenchmark.py and the labeled data can be loaded from data-train/twitter/supervised/train.s1 train.s2 and train.lbl (each three parallel lines are corresponding to sentence 1, sentence 2 and gold label for these two sentences).

For hyper-parameters, I trained all these SBERT-supervised models for 10 epochs, with 0.1 * #total steps of linear warmup and early-stopping on the dev score if possible. All the other hyper-parameters are used as the default setting in SentenceTransformer.fit.

If you have further questions about this, I can give you more hints:)

UKPLab / useb

SimCSE supervised #2