Reproducing zero-shot eval results on EK100-MIR

Hi, I have downloaded the pretrained ckpt c89337, and use eval_zeroshot.py to evaluate on EK100_MIR in a zero-shot manner.

I prepared the dataset following the instruction follow the command: python eval_zeroshot.py --dataset ek100_mir --root datasets/EK100/video_ht256px/ --clip-length 4 --resume $PATH

The results I got are: mAP: V->T: 0.334 T->V: 0.251 AVG: 0.292 nDCG: V->T: 0.331 T->V: 0.300 AVG: 0.315

If I increase the clip_len from 4 to 16 as described in the paper, the results are: mAP: V->T: 0.341 T->V: 0.264 AVG: 0.303 nDCG: V->T: 0.335 T->V: 0.305 AVG: 0.320

Both seems to be much lower than the number reported in the paper: mAP: 36.1 , nDCG:34.6

May I ask what might be the cause of the performance gap ? Thanks in advance.

facebookresearch / LaViLa

Reproducing zero-shot eval results on EK100-MIR #11