facebookresearch / AVID-CMA

Audio Visual Instance Discrimination with Cross-Modal Agreement
Other
127 stars 18 forks source link

Pretrained model with R(2+1) D backbone from Table 10 #6

Open fmthoker opened 3 years ago

fmthoker commented 3 years ago

Thanks for releasing the code and pretrained models of your amazing work "Audio-Visual Instance Discrimination with Cross-Modal Agreement". I noticed that you used different architectures for R(2+1)D in different experiments as shown in Table 9 and Table 10. Can you please release/share the kinetics pretrained model where you used the architecture of Table 10 for the R(2+1) D backbone? I am working on self-supervised learning and want to include your paper in my current project. For comparison purposes, I want to use the same backbone as done by previous works.

pedro-morgado commented 3 years ago

Hi, thanks for your interest in our work. The models described in Table 10 have already been released in this repo. Your confusion seems to come from the fact that there's a typo in the captions of Tables 9 and 10. The architecture of Table 10 is for the large-scale experiments (in Section 5), and that of table 9 is for the rest (Section 3 and 4). Sorry about the confusion.