YuanGongND / cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
BSD 2-Clause "Simplified" License
223 stars 22 forks source link

Acquiring checkpoints of VGGSound (audio), VGGSound (video) #13

Open mouxingyang opened 1 year ago

mouxingyang commented 1 year ago

Hi Yuan,

Could you please release the checkpoints of VGGSound (audio) and VGGSound (video) or send me a copy of them? The checkpoints will help me to reproduce the results (59.5 and 47.0) of Table 1. Besides, what is the setting of 'CAV-MAE-Scale++'? I cannot find the meaning of '++' in your paper. If it varies from the '+' version, could you please send me the copy of CAV-MAE-Scale+ to reproduce the results (19.8) in Table1?

Best regards,

YuanGongND commented 1 year ago

Besides, what is the setting of 'CAV-MAE-Scale++'? I cannot find the meaning of '++' in your paper. If it varies from the '+' version, could you please send me the copy of CAV-MAE-Scale+ to reproduce the results (19.8) in Table1?

++ simply means the batch size is 256, which is trained after the paper is finished. For usage, please feel free to use ++ model, which is in general the best model. For research comparison, you do not have to compare with ++ as it is not on the paper.

Releasing additional model might be hard.