YuanGongND / cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
BSD 2-Clause "Simplified" License
232 stars 23 forks source link

Question for contrastive loss weight in the paper #20

Open sukun1045 opened 1 year ago

sukun1045 commented 1 year ago

I have a question regarding the weights used in CAV-MAE. It seems like the $\lambda_c$ could play an important role in the optimization. I understand it is due to the gradient scale but It is surprising that the ablation study for CAV (contrastive loss only) still requires $\lambda_c$ to be $0.1$ or $0.01$. I am wondering what happened if $\lambda_c$ is set as 1? Will it lead to overfitting issue?

Best,

Kun

YuanGongND commented 1 year ago

hi there,

could you point me to the table you are referring to?

$\lambda$ scales the loss, which is related to the learning rate. Sometimes we keep $\lambda$ same for CAV and CAV-MAE to make a fair comparison. I think it can be / should be set to 1 if you solely interested in CAV, but you may need to tune the learning rate.

-Yuan

sukun1045 commented 1 year ago

Yeah, that's what I thought. I can tune the learning rate, but is there any particular reason that the contrastive loss needs a smaller learning rate?

In Table 3, Audio-Visual Models with only Contrastive Loss.

image
YuanGongND commented 1 year ago

I believe there are two things:

  1. In the current lr setting, lambda=0.01 / lambda=0.1 is also a better hyperparameter setting than lambda = 1 for the joint classification task, if I recall correctly, I did a search on lamda C, this is because I want to prove CAV-MAE is better than CAV, so I have to find best hyperparameter for CAV. On the other hand, if you set lambda = 1, you probably need to tune LR, so the training setting of CAV and CAV-MAE will be different.

  2. In this table, the main purpose is to say adding a MAE loss doesn't hurt the retrieval performance, so I control lambda the same for a fair comparison.

-Yuan