Question about feature extraction method

Hello. I'm trying to evaluate some model on the pre-computed MSR-VTT dataset that you provided. But the result was on par with the random selection. In the process of analyzing the cause, I think that there is a difference in the visual feature extraction step.

Can you tell me which framework (TF, Keras, PyTorch...) and weight source you used in the visual feature extraction stage? Then I can analyze and experiment your research under the same conditions with other models.

Thank you in advance!

danieljf24 / dual_encoding

Question about feature extraction method #4