Open sukun1045 opened 1 year ago
hi there,
could you point me to the table you are referring to?
$\lambda$ scales the loss, which is related to the learning rate. Sometimes we keep $\lambda$ same for CAV and CAV-MAE to make a fair comparison. I think it can be / should be set to 1 if you solely interested in CAV, but you may need to tune the learning rate.
-Yuan
Yeah, that's what I thought. I can tune the learning rate, but is there any particular reason that the contrastive loss needs a smaller learning rate?
In Table 3, Audio-Visual Models with only Contrastive Loss.
I believe there are two things:
In the current lr setting, lambda=0.01 / lambda=0.1 is also a better hyperparameter setting than lambda = 1 for the joint classification task, if I recall correctly, I did a search on lamda C, this is because I want to prove CAV-MAE is better than CAV, so I have to find best hyperparameter for CAV. On the other hand, if you set lambda = 1, you probably need to tune LR, so the training setting of CAV and CAV-MAE will be different.
In this table, the main purpose is to say adding a MAE loss doesn't hurt the retrieval performance, so I control lambda the same for a fair comparison.
-Yuan
I have a question regarding the weights used in CAV-MAE. It seems like the $\lambda_c$ could play an important role in the optimization. I understand it is due to the gradient scale but It is surprising that the ablation study for CAV (contrastive loss only) still requires $\lambda_c$ to be $0.1$ or $0.01$. I am wondering what happened if $\lambda_c$ is set as 1? Will it lead to overfitting issue?
Best,
Kun