Still Some Questions about TTM

lemon-prog123 commented 2 years ago

Hi,Thank you for your answers about what I mentioned before.But I am still confused on some points.

Did you clean the data of person_id==-1 in dataloader? It seems that only person_id==None or Zero will be not taken into consideration or there is somewhere I ignored.
And you mentioned that there will be a 'av_test_unannotated.json',has it released yet? And what we will get when we deal wtih the test set? Can we still get the bbox or the full annotations from the former AV Diarization tasks including transcriptions just as train-set?
Can I assume that all of the segments in TTM belongs to someone who is speaking but not who is slient? It means that I can ignore the situation where people are in a period of slience.

Thank you!

zcxu-eric commented 2 years ago

Hello,

Camera wearers will be ignored.
Please take a look at #3, basically we use a stand-alone testset because the gt information may cause leakage issues for other benchmarks. We will not release gt bbox or transcripts.
All of the segments are from active speakers.

lemon-prog123 commented 2 years ago

yeah，I've read the note in readme. Actually，I've seen your fix on Camera-wearer so that they wouldn't be taken into consideration in your baseline model. I mean that you mentioned that person-id is negative one means that they can't be recognized and will be ignored.But I check the segments on TTM and find that there are some segments belong to negative one. Did you deal with them in the baseline model？

Thank you for your answering.

zcxu-eric commented 2 years ago

I see, the person_id is not related to talking-to-me status, so we will keep these segments if they do have talking-to-me labels.

lemon-prog123 commented 2 years ago

ok ，I see

EGO4D / social-interactions

Still Some Questions about TTM #5