Question about the lambda

Bikesuffer commented 1 year ago

Hi there, It's me again, I am curious about whether you guys tried different combination of lambda for feat_loss and out_loss or maybe add a lambda for the task_loss?

From my training process, it seems that the feat_loss contributes most part of the total loss.

bokyeong1015 commented 1 year ago

Hi, thanks for your inquiry.

For our text-to-image experiments, we simply set the loss weights λ_Task, λ_OutKD, and λ_FeatKD to 1, which was effective in empirical validation without hyperparameter tuning and was used in the experiments of our paper.

In recent trials with BK-SDM-Small and batch size 64, changing λ_FeatKD to {0.25, 0.5, 1, 2, 4} did not affect the final generation scores. However, using different scales like 0.01, 0.1, 10, or 100 hasn't been explored.

It would be interesting to study the effect of different loss weightings.

added: some experimental results were as follows:

recent trials with BK-SDM-Small, batch size 64, changing λ_FeatKD to {0.25, 0.5, 1, 2, 4}
the ablation study presented in our paper (v: loss weight = 1, x: loss weight = 0)

Bikesuffer commented 1 year ago

Thanks for the information.

Nota-NetsPresso / BK-SDM

Question about the lambda #33