Hello. I'm trying to evaluate some model on the pre-computed MSR-VTT dataset that you provided.
But the result was on par with the random selection.
In the process of analyzing the cause, I think that there is a difference in the visual feature extraction step.
Can you tell me which framework (TF, Keras, PyTorch...) and weight source you used in the visual feature extraction stage? Then I can analyze and experiment your research under the same conditions with other models.
I recommend utilizing our provided features to train your own model, which makes the performance comparison fair. Besides, we used MXNet with its provided ResNet152 model trained on ImageNet with 1k categories.
Hello. I'm trying to evaluate some model on the pre-computed MSR-VTT dataset that you provided. But the result was on par with the random selection. In the process of analyzing the cause, I think that there is a difference in the visual feature extraction step.
Can you tell me which framework (TF, Keras, PyTorch...) and weight source you used in the visual feature extraction stage? Then I can analyze and experiment your research under the same conditions with other models.
Thank you in advance!