Question: How did you train `xlm-r-100langs-bert-base-nli-stsb-mean-tokens`?

PhilipMay commented 3 years ago

Hi, as you might know I open sourced a German translation of the stsb dataset: https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark

I tested xlm-r-100langs-bert-base-nli-stsb-mean-tokens on the test set of stsb and it performes surprisingly good. The question I have: Did you train it on the full (englich) dataset (including test) and then multi language trained it? That would be the reason why it performs so good...

I do not know why it performs so good and would like to understand.

Thanks Philip

nreimers commented 3 years ago

Hi Philip, it uses bert-base-nli-stsb-mean-tokens as teacher and xlm-r als student model (https://arxiv.org/abs/2004.09813)

The bert model was trained on SNLI+MultiNLI and on the STSb train set (tuned on STSb dev set). No data from STSb test was used.

Pre-training with NLI data makes quite a big difference for STS.

PhilipMay commented 3 years ago

Ok - thanks.

UKPLab / sentence-transformers

Question: How did you train `xlm-r-100langs-bert-base-nli-stsb-mean-tokens`? #469