YuanGongND / cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
BSD 2-Clause "Simplified" License
214 stars 20 forks source link

Which epoch of pre-trained models should I use? #4

Closed GenjiB closed 1 year ago

GenjiB commented 1 year ago

Hi,

I just noticed that the script you provided use audio_model.21. Does that mean you use the model at 21th epoch?

Because you pre-train the model with epochs, it would be a little bit confusing to me.

Thank you

YuanGongND commented 1 year ago

Can you point to me where .21 is seen?

GenjiB commented 1 year ago

In the script: https://github.com/YuanGongND/cav-mae/blob/master/egs/audioset/run_cavmae_ft_bal_audioonly.sh#L26

YuanGongND commented 1 year ago

This is correct -

In the original paper, we use the 25th epoch pretrain checkpoint but later found the 21st would be good enough (because the lr scheduler cuts off every 5 epochs, 21st epoch is right after the cut). The cav-mae scale ++ model was trained after the paper got published, so there is a minor difference.

But this is a minor thing, I recommend to use the 21st because it saves pretraining cost, but 25th leads to very similar result.

-Yuan

YuanGongND commented 1 year ago

Btw, the public link we provided for cav-mae scale++ is 21st epoch checkpoint. There's no other model released.

GenjiB commented 1 year ago

thanks so much for the clarification