TianWuYuJiangHenShou / LabelSemantics

Code for Label Semantics for Few Shot Named Entity Recognition
54 stars 7 forks source link

由实体分词,标签引发的分数计算问题 #8

Closed dawei-yu closed 2 years ago

dawei-yu commented 2 years ago

作者你好,在这段代码中,你似乎是根据标签块划分实体(即将连续相同的实体标签当作同一个实体,比如连续的'O'当作一个实体),这样在计算分数时不会有错吗?比如'tokens': ['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.'],'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0], 这个例子中只看到call这个实体,后面的boycott不被算入实体,这样的话,在计算精确率,召回率, F1都会不一样。希望作者能解答我这个萌新的疑惑,谢谢。

def get_entities(tags):
    start, end = -1, -1
    prev = 'O'
    entities = []
    n = len(tags)
    tags = [tag.split('-')[1] if '-' in tag else tag for tag in tags]
    for i, tag in enumerate(tags):
        if tag != 'O':
            if prev == 'O':
                start = i
                prev = tag
            elif tag == prev:
                end = i
                if i == n -1 :
                    entities.append((start, i))
            else:
                entities.append((start, i - 1))
                prev = tag
                start = i
                end = i
        else:
            if start >= 0 and end >= 0:
                entities.append((start, end))
                start = -1
                end = -1
                prev = 'O'
    return entities
TianWuYuJiangHenShou commented 2 years ago

@dawei-yu 换成其他方式应该也可以的吧,可以试试sklearn的acc计算方式

dawei-yu commented 2 years ago

@dawei-yu 换成其他方式应该也可以的吧,可以试试sklearn的acc计算方式

谢谢你,我已经解决了