Closed PhilipMay closed 3 years ago
Hi Philip, it uses bert-base-nli-stsb-mean-tokens as teacher and xlm-r als student model (https://arxiv.org/abs/2004.09813)
The bert model was trained on SNLI+MultiNLI and on the STSb train set (tuned on STSb dev set). No data from STSb test was used.
Pre-training with NLI data makes quite a big difference for STS.
Ok - thanks.
Hi, as you might know I open sourced a German translation of the stsb dataset: https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark
I tested
xlm-r-100langs-bert-base-nli-stsb-mean-tokens
on the test set of stsb and it performes surprisingly good. The question I have: Did you train it on the full (englich) dataset (including test) and then multi language trained it? That would be the reason why it performs so good...I do not know why it performs so good and would like to understand.
Thanks Philip