SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
454 stars 32 forks source link

How to set labels for contradict pairs #26

Closed mengyao00 closed 6 months ago

mengyao00 commented 9 months ago

Snli dataset contains contradict pairs, they define labels: label: an integer whose value may be either 0, indicating that the hypothesis entails the premise, 1, indicating that the premise and hypothesis neither entail nor contradict each other, or 2, indicating that the hypothesis contradicts the premise. Dataset instances which don't have any gold label are marked with -1 label. Make sure you filter them before starting the training using datasets.Dataset.filter. If I want to use AngLE to fine-tune on those kind of dataset, Should I set -1 for contradict pairs?

SeanLee97 commented 9 months ago

@mengyao00 In our practice for MultiNLI and SNLI, we set 0 and 1 for the contradict and entailment labels, respectively. BTW, we do not use the neutral and -1 labels for training. We think they are noise.