LAM/TMM baselines may be using incorrect labels.

Hello, I'm trying to understand how the LAM and TTM branch baselines construct the ground-truth labels, and the current process appears to be incorrect.

1) I noticed that _/data/json/avtrain.json and _/data/json/avval.json already provide the boolean variable is_at_me within two arrays: social_segments_talking and social_segments_looking (also described here in the official documentation). But looking at the files _get_lamresult.py or _get_ttmresult.py, it seems like the developers completely ignored the is_at_me boolean variable. Why is that?

2) Next, inside _get_lamresult.py, a variable temp_dict is populated with "label" = person taken from clip['social_segments_looking'], but this completely disregards the value of the target variable which is also present. Most of the time target is set to None. So none of these should be considered valid "looking-at-me" examples. However, inside _dataloader.py, all of these incorrect labels are passed directly into a positive set:

...
        for gt in gts:
            for i in range(gt['start_frame'], gt['end_frame'] + 1):
                positive.add(str(i) + ":" + gt['label'])
...

So the positive set is made up of many or all cases where target=None. Why is this?

A quick example to double-check is video clip 0b4cacb1-970f-4ef0-85da-371d81f899e0.mp4 (screenshot below).

According to the source _avtrain.json, there are 36 looking-at-me segments, but all of them have target=None and is_at_me=False. There are many people speaking, and many face tracklets (bounding boxes) available, but none of these should be considered "positive examples". However, when running _get_lamresult.py, a JSON file 0b4cacb1-970f-4ef0-85da-371d81f899e0.json is produced and this is used downstream to make a full positive set. This doesn't make sense to me. Can someone please explain if this is a bug in the _get_lamresult.py and _dataloader.py files?

EGO4D / social-interactions

LAM/TMM baselines may be using incorrect labels. #24