Closed anikethjr closed 3 years ago
Hi ,
You can find our results in Table 5 on the main paper and we get: R1: 0.099 - R5: 0.24 R10: 0.324 - Median R:29.5 which are quite similar to the results you get overall? (some metrics are better than others)
The results are not exactly the same from the paper since this is a total reimplementation from scratch of the original work that uses internal tool from DeepMind, and hences some small differences in implementation could lead to minor differences in results.
Hey,
I didn't realize that the metrics reported by the code are ratios and not percentages as in the paper. Thank you so much for the clarification!!
Hello,
Thank you so much for sharing your code and pretrained models. I was trying to replicate your text-video retrieval results on the MSR-VTT dataset. I obtained the pretrained model from here - https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/s3d_howto100m.pth. I ran the command mentioned in the README to perform the evaluation but using a smaller batch size, I didn't change any of the other parameters:
I get the following results:
These numbers are much lower than the ones mentioned in the table. I am guessing that the evaluation parameters are different since changing the batch size should not affect the results. Could you please tell me what parameter values were used to obtain the results mentioned in the table?
Thank you!