BestJuly / Pretext-Contrastive-Learning

Official codes for paper "Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning".
13 stars 2 forks source link

kinetics pretrained models #1

Open fmthoker opened 2 years ago

fmthoker commented 2 years ago

Thanks for the code. Can you please share kinetics pretrained models. I want to include your paper in my experimental study.

BestJuly commented 2 years ago

Hi, @fmthoker. Thank you for your interest.

I am busy with a lot of things for a long time. And I do not update this repo although we actually plan to make all codes & trained models public. Sorry for that.

We have reported two models (ResNet-18-3D and R(2+1)D-18-3D) pretrained on Kinetics400. As you requested, I will check the trained models in our server and update the Kinetics-400 pretrained model by the end of this week.

fmthoker commented 2 years ago

Thanks for the quick response, sure I can understand that. Please make sure to upload the R(2+1)D-18-3D) pretrained on Kinetics400 as that is of our main interest at the moment.

BestJuly commented 2 years ago

Hi, @fmthoker. I have uploaded the requested checkpoint to google drive, which is pretrained using our methods and can be used to downstream tasks by finetuning. Please note that the reported retreival accucacies in the paper are not based on this model, but you can still get good retrieval accuracies (in my exp, 45.4% at top1). For this part, you need to modify the retrieve_clips.py.

If you have other usage in your own experiments, also please pay attention to the checkpoint dict. You may need to write a function similar to this to adjust names.

There are two different kinds of R2plus1D in my codes and R(2+1)D-18-3D is defined here.

The current retrieve_clips.py and ft_classify.py still do not support R(2+1)D-18-3D. But the usage is similar to other network backbones and it would be easy to modify for experiments.

If you need further help, I am glad to share part of my code here. However, the update of this repo might be delayed because recently, I am afraid I could not find time to refactor codes for a published version.

fmthoker commented 2 years ago

Thanks for sharing the models. I would try to adapt retrieve_clips.py and ft_classify.py for R(2+1)D-18-3D and get back to you in case of any problems. One question though, you mention that there are 2 R2plus1D models, is there a difference between the two?

BestJuly commented 2 years ago

Yes, the main differences lie to the model depth. R2plus1D means models with similar architecture to ResNet but using (2+1)D instead of 3D. R3D and R21D were mentioned in some previous works such as VCP/VCOP/etc., where the network depth is shallower compared to ResNet-18-3D (in some papers, they call 3D-ResNet-18). The layer numbers for each block of ResNet-18 is (2,2,2,2) while in R3D and R21D, these numbers are (1,1,1,1). You can check the layer sizes here.

To use ResNet-18-3D and R2Plus1D, a convenient way is to directly use models provided by torchvision with pretrained=False for self-supervised learning. Therefore, the network archtecture of my provided model weights are defined in network.py, NOT in r21d.py.

fmthoker commented 2 years ago

Thanks for your help so far, I managed to run and use your R2Plus1D model. Can you please share the Kinetics400 pre-trained ResNet-18-3D too?