def get_entities(tags):
start, end = -1, -1
prev = 'O'
entities = []
n = len(tags)
tags = [tag.split('-')[1] if '-' in tag else tag for tag in tags]
for i, tag in enumerate(tags):
if tag != 'O':
if prev == 'O':
start = i
prev = tag
elif tag == prev:
end = i
if i == n -1 :
entities.append((start, i))
else:
entities.append((start, i - 1))
prev = tag
start = i
end = i
else:
if start >= 0 and end >= 0:
entities.append((start, end))
start = -1
end = -1
prev = 'O'
return entities
作者你好,在这段代码中,你似乎是根据标签块划分实体(即将连续相同的实体标签当作同一个实体,比如连续的'O'当作一个实体),这样在计算分数时不会有错吗?比如'tokens': ['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.'],'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0], 这个例子中只看到call这个实体,后面的boycott不被算入实体,这样的话,在计算精确率,召回率, F1都会不一样。希望作者能解答我这个萌新的疑惑,谢谢。