antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
https://arxiv.org/abs/2206.08155
Apache License 2.0
153 stars 23 forks source link

Bad zero-shot results on TVQA #15

Closed fake-warrior8 closed 1 year ago

fake-warrior8 commented 1 year ago

Hi, I ran the zero-shot result on TVQA dataset with the given zero-shot checkpoint frozenbilm.pth and the given TVQA video features clipvitl14.pth. I also used the microsoft/deberta-v2-xlarge checkpoint. However, I got the val acc 31.59 instead of the reported 59.7.

antoyang commented 1 year ago

Are you loading the subtitles correctly? The text-only model with subtitles should already get better accuracy without further training. You may check that the perf looks ok on other datasets too.

fake-warrior8 commented 1 year ago

Are you loading the subtitles correctly? The text-only model with subtitles should already get better accuracy without further training. You may check that the perf looks ok on other datasets too.

Thank you for your reply. I found that I loaded an empty subtitles. I used the dataset providers to download the subtitles and got an empty subtitle file. When I replace the subtitles to your given Google drive subtitles, I got the right results.