OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.29k stars 83 forks source link

Confusion about zero-shot setting on Video-Text Retrieval #89

Open overwhelmedxh opened 5 months ago

overwhelmedxh commented 5 months ago

Thank you for your in interesting work and your shared code! I'm very confused that whether the zero-shot performance on MSRVTT reported in here requires setting “--mergeclip=True”? Below is the result I reproduced: “--mergeclip=True”: image “--mergeclip=False”: image

AS the provided file defaults to "--mergeclip=True", I wonder if there is something wrong with this.

1240446371 commented 5 months ago

it seems that when setting “merge=True”,the results are better than the paper presented?

overwhelmedxh commented 5 months ago

it seems that when setting “merge=True”,the results are better than the paper presented?

Yes. It seems that the results reported in the paper are obtained by setting “merge=True” without DSL.

1240446371 commented 5 months ago

it seems that when setting “merge=True”,the results are better than the paper presented?

Yes. It seems that the results reported in the paper are obtained by setting “merge=True” without DSL.

I test the performance on activityNet,and obtain better results on “merge=True” with DSL,but obtain worse results on “merge=True” without DSL(worse than paper presented). The author replied to another people that they use DSL results. I also confuse about which setting they use ~~

Hari-Durai-Baskar commented 4 months ago

it seems that when setting “merge=True”,the results are better than the paper presented?

Yes. It seems that the results reported in the paper are obtained by setting “merge=True” without DSL.

I test the performance on activityNet,and obtain better results on “merge=True” with DSL,but obtain worse results on “merge=True” without DSL(worse than paper presented). The author replied to another people that they use DSL results. I also confuse about which setting they use ~~

Hi, were u able to resolve the confusion?