Acquiring checkpoints of VGGSound (audio), VGGSound (video)

YuanGongND / cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

BSD 2-Clause "Simplified" License

223 stars 22 forks source link

Besides, what is the setting of 'CAV-MAE-Scale++'? I cannot find the meaning of '++' in your paper. If it varies from the '+' version, could you please send me the copy of CAV-MAE-Scale+ to reproduce the results (19.8) in Table1?

++ simply means the batch size is 256, which is trained after the paper is finished. For usage, please feel free to use ++ model, which is in general the best model. For research comparison, you do not have to compare with ++ as it is not on the paper.

Releasing additional model might be hard.

YuanGongND / cav-mae

Acquiring checkpoints of VGGSound (audio), VGGSound (video) #13