First of all great work. I have managed to reproduce your results. However could you please provide additional information on the model you have utilized for video feature extraction . I tried experimenting with features extracted using the torchvision Resnet-152 model (pre trained weights). However, they didn't performed particularly well using the trained model you have provided for Dual Encoding.
I assume since you have trained your model using features from a particular Resnet model the dual encoding is biased towards it . In order to achieve a good result using your trained model, the same CNN model needs to be utilized for the feature extraction.
So could you please give more information about the particular variant of the Resnet-152 model you utilized.
Hi,
First of all great work. I have managed to reproduce your results. However could you please provide additional information on the model you have utilized for video feature extraction . I tried experimenting with features extracted using the torchvision Resnet-152 model (pre trained weights). However, they didn't performed particularly well using the trained model you have provided for Dual Encoding.
I assume since you have trained your model using features from a particular Resnet model the dual encoding is biased towards it . In order to achieve a good result using your trained model, the same CNN model needs to be utilized for the feature extraction.
So could you please give more information about the particular variant of the Resnet-152 model you utilized.
Thanks