SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
493 stars 33 forks source link

About angle-bert-base-uncased-nli-en-v1 evaluation issues #68

Closed Tffffboys closed 8 months ago

Tffffboys commented 8 months ago

When I use angle-bert-base-uncased-nli-en-v1 to evaluate STS performance, I find that it is inconsistent with the original report.

1711711412953

1711711836452

The command line I use:

python eval_nli.py 
--model_name_or_path /home/whzhu_st/Model/angle-bert-base-uncased-nli-en-v1  
--task_set sts 
--pooling_strategy cls_avg

Enviroment:

torch 1.13.1
transformer 4.38.1
V100 GPU

So is this result acceptable within the error range or is there something wrong with my command?

SeanLee97 commented 8 months ago

Hi @Tffffboys , the inconsistency arises from the differences in pooling implementation between the newer and older versions. To resolve this, you can consider downgrading the angle_emb to 0.1.1. Below are the results using angle_emb==0.1.1

+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 75.26 | 85.61 | 80.64 | 86.36 | 82.51 |    85.64     |      80.99      | 82.43 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
SeanLee97 commented 8 months ago

+-------+-------+-------+-------+-------+--------------+-----------------+-------+

But its performance (82.43) is higher than the original report (82.37) ...

It is a long time ago, maybe I mistakenly uploaded a newer model to HF. I forgot which specific model achieved 82.37 performance😂

Tffffboys commented 8 months ago

Hi @Tffffboys , the inconsistency arises from the differences in pooling implementation between the newer and older versions. To resolve this, you can consider downgrading the angle_emb to 0.1.1. Below are the results using angle_emb==0.1.1

**+-------+-------+-------+-------+-------+--------------+-----------------+-------+**
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 75.26 | 85.61 | 80.64 | 86.36 | 82.51 |    85.64     |      80.99      | 82.43 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+

Thank you for your reply

Tffffboys commented 8 months ago

But its performance (82.43) is higher than the original report (82.37) ...

It is a long time ago, maybe I mistakenly uploaded a newer model to HF. I forgot which specific model achieved 82.37 performance😂

😂