OpenGVLab / unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
https://arxiv.org/abs/2303.16058
MIT License
267 stars 13 forks source link

Some json files are missing #22

Closed cmh1027 closed 7 months ago

cmh1027 commented 8 months ago

Hello, I'm looking for QA msvd json file. I found retrieval msvd json file in here, but where is the one for QA?

Andy1621 commented 8 months ago

Check it here

cmh1027 commented 8 months ago

@Andy1621 Thanks. Could you also provide me with json files of lsmdc dataset?

Andy1621 commented 8 months ago

Check it here

cmh1027 commented 8 months ago

@Andy1621 Thanks for fast reply! I have one more question image This is a zero-shot retrieval result on MSVD dataset using l16_25m.pth. Could you guess why img_r1 is so low compared to the table in the repo?

Andy1621 commented 8 months ago

The results are correct! I have updated the results in MODEL_ZOO a few days ago. The results are lower because I have fixed the bug, where I used is_paragraph_retrieval=True.

cmh1027 commented 8 months ago

image

@Andy1621 Did you also update the table of fine-tuning retrieval on msvd? This is a result of fine-tuning retrieval using b16_25m.pth and it looks like much lower than the one in the table (50.8/73.3)

Andy1621 commented 8 months ago

Have you used the new script? Please check the log.

cmh1027 commented 8 months ago

If I want to evaluate fine-tuning retrieval result on MSVD dataset, the checkpoint I have to use should be downloaded from this url right? (ret_msvd_b16_25m.pth)

Andy1621 commented 8 months ago

No, you should download the pretrained model, which you use for zero-shot testing.

cmh1027 commented 8 months ago

image I want to reproduce the result of this table, but isn't this table referring to the results of fine-tuned model, not zero-shot?

Andy1621 commented 8 months ago

Please check the caption for each table in detail.