Junhua-Liao / Light-ASD

The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)
MIT License
99 stars 15 forks source link

Face categorisation by entity #11

Closed mujavidb closed 9 months ago

mujavidb commented 9 months ago

I can see in your evaluation code that for each detected entity in a video, you capture the face data for each frame, and then you pass these into the ASD detector on an entity by entity level.

How are you categorising faces by entity? I can see face detection is done with SF3D. But when load_visual is called the faces are already segmented by entity. So, at inference, each iteration of the data loader val_loader is done on a face-by-face level. It is unclear how categorisation is done by entity in this example code.

Junhua-Liao commented 9 months ago

Thank you for your interest in our work. The AVA dataset has provided entity information for each face. As for the Columbia dataset, you can refer to the 'track_shot' function in line 125 of the Columbia_test.py file.

mujavidb commented 8 months ago

Very helpful thanks.