Open aleversn opened 3 years ago
And now I'm wondering if the textual similarity score should be closed to the label normalized value?
Usually they should become close. But depending how your data looks like, this is not necessarily possible.
Can I understand that in this way: the model can calculate how similar the two sentences are, but not necessarily close to the label value?
Yes, if you have many extreme values (like either 0 or 1), the model will also learn values in between.
For STSb, you can take the pre-trained models here and compute their distance to the gold label.
Right now I'm testing STS-benchmark
dev set using the bert-large-nli-stsb-mean-tokens
pre-trained model and the distance is computed by
average_distance = torch.abs(cosine_scores - torch.tensor(e_score).cuda()).mean()
where the cosine_scores
are prediction scores and the e_score
are label normalized values, the result of average_distance
is 0.43, it seems the distance is a little bit high, is there another way to recover the scores correctly closed to the label value?
Hi, I am working on scoring subjective answers. And now I'm wondering if the textual similarity score should be closed to the label normalized value?
I try this work on the
STS
dataset and here is my example:average_distance
equals to the average absolute distance between prediction scores and the real scores.I've normalized the label score into
0 ... 1
and it looks like theaverage_distance
is rather high(about 0.4) no matter what I used in fine-tuning or directly use the pre-trained model. Can the model be used to scoring the answers or were there any mistakes I've made?