Open melongua opened 1 year ago
Hi @melongua,
Can you provide some more details about (1) the EK100 data that you are using and (2) some other customized metadata e.g. the relevancy matrix? I believe these might have some effect on the final performance. We've uploaded the ones we used in this doc.
Best, Yue
Hi, I have downloaded the pretrained ckpt c89337, and use eval_zeroshot.py to evaluate on EK100_MIR in a zero-shot manner.
I prepared the dataset following the instruction follow the command:
python eval_zeroshot.py --dataset ek100_mir --root datasets/EK100/video_ht256px/ --clip-length 4 --resume $PATH
The results I got are: mAP: V->T: 0.334 T->V: 0.251 AVG: 0.292 nDCG: V->T: 0.331 T->V: 0.300 AVG: 0.315
If I increase the clip_len from 4 to 16 as described in the paper, the results are: mAP: V->T: 0.341 T->V: 0.264 AVG: 0.303 nDCG: V->T: 0.335 T->V: 0.305 AVG: 0.320
Both seems to be much lower than the number reported in the paper: mAP: 36.1 , nDCG:34.6
May I ask what might be the cause of the performance gap ? Thanks in advance.