JinYuanLi0012 / PGIM

[EMNLP 2023 Findings] Prompting Chatgpt in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge
19 stars 0 forks source link

一些数据集相关问题 #11

Closed hhy150 closed 2 months ago

hhy150 commented 2 months ago

我在测试数据集中看到有些标签只有I-PER,但是前面没有B-PER这种,请问这种是不是标签有问题?或者说一个词就是一个实体的话,那么这个词标 I-PER和B-PER都是对的?? 比如: IMGID:1908210 ONLY O BELIEVE O the O dynamic O gospel O album O from O Elder O Arthur O R O . O Johnson I-PER . O Available O at O select O outlets O : O http://t.co/0WzcfKMSNn O

IMGID:68088 George B-PER W O . O Bush I-PER takes O ice B-OTHER bucket I-OTHER challenge I-OTHER http://t.co/oAe1Sdi9AY O challenges O @billclinton O next. O http://t.co/0YnM6RNe9P O

IMGID:21659 Good O morning O to O you O too O , O Mr O . O Hopewell I-PER http://t.co/Dacow9O4mh O

JinYuanLi0012 commented 2 months ago

这是twitter-2015数据集的老问题了。在PGIM论文附录中我们其实有进行讨论。

Twitter-2015的准确率比2017低很多的主要原因就是你发现的这个问题。

hhy150 commented 2 months ago

原来如此hhhh,非常感谢你的解答。