clamsproject / aapb-annotations

Repository to store manual annotation dataset developed for CLAMS-AAPB collaboration
3 stars 0 forks source link

Inconsistent gold standard formats for TimeFrame labels #93

Open marcverhagen opened 1 week ago

marcverhagen commented 1 week ago

Because

Gold formats are not consistent across TimeFrame annotations:

january-slates

GUID,collection,start,end,type,digital,format-summary,moving-elements
cpb-aacip-129-000003cx,North Carolina Now,00:00:05.000,00:00:13.000,t,True,boxes to fill in,no

newshour-chyron

index,start,end,text
1,00:04:35.777,00:04:39.777,JOHN BLOCK\nSecretary of Agriculture
2,00:05:56.777,00:05:59.527,REP. THOMAS P. O'NEILL\nSpeaker of the House
3,00:08:36.527,00:08:40.527,RITA LAVELLE\nFormer E.P.A. Official

scene-recognition

start,end,type label,modifier
00:00:04.004,00:00:34.000,B,False

To some degree this is necessary because these three contain different information. But if we want to automatically find gold annotations that can be used for evaluating just the labels it would be nice to standardize on start, end and label (or type). We already do start and end, but for label we either use

keighrim commented 1 week ago

t in slate annotation is not true (https://github.com/clamsproject/aapb-annotations/tree/main/january-slates)

marcverhagen commented 1 week ago

Ah, thanks for pointing that out, the "t" has nothing to do with the label. It actually means "typed" I think.

The description of the issue has been changed. The only thing that remains is that we need to figure out a simple way to get to the default label.