EGO4D / episodic-memory

MIT License
99 stars 54 forks source link

Ambiquity in NLQ annotations #44

Open Davidyao99 opened 1 year ago

Davidyao99 commented 1 year ago

Great project! I was looking through the annotations for the NLQ task, and notice that there might be multiple instances in the video that answers the given query. In the paper, it seems that queries are chosen in a way such that answers are unambiguous.

image

An example of such ambiguity is in video id: 3534864b-2289-4aaf-b3ed-10eeeee7acd2 and query: "Where did I put the scooper". The ground truth is given to be around 1675s.

image However, the scooper is seen to be placed onto the tabletop and subsequently on the weighing scale at around timestamp 1785s.

image

These seem to be appropriate responses to the query that is different from the ground truth. These also seem to fall within the time interval of the clip.

satwikkottur commented 1 year ago

Hello @Davidyao99 ,

Thanks for your question. Here is the verbatim guidelines used in the annotation process (intermediate part skipped for brevity).

For the specific object queries (“when did I last see X; where did I put X?”), be sure to annotate only the last occurrence of that object. ... We only want to ensure that the marked object has not moved between the time it was marked and the end of the video.

Natural Language Queries (NLQ) are assumed to be asked at the end of the video and thus the right window should be the last occurrence of the object. It is likely that there is some noise due to annotator errors. Do you know how often such instances occur?