AFQMC 蚂蚁金融语义相似度数据集相关问题

CLUEbenchmark / CLUE

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

http://www.CLUEbenchmarks.com

4.02k stars 540 forks source link

Open sunyilgdx opened 3 years ago

sunyilgdx commented 3 years ago

通过验证集发现，即使全预测为0，也有0.69的ACC，Google-BERT-Base 模型可以达到大约0.735左右的ACC，但实际上F1值仅有0.5+，通过人工观察数据集，也未发现有很明显的规律，请问这个数据集是否存在标注缺陷