Questions about the Momentum Encoder and Mix Block

unira-zwj commented 1 year ago

Dear Author,

You are doing a great job! I have read your paper carefully, but I still can't understand what is the supervision information of the two network modules: momentum encoder and mix block, can you explain it?

Thank you!

Lupin1998 commented 1 year ago

Hi, @rovebot, thanks for your question. Sorry for the late reply. As for the training process of AutoMix, I provide a simplified workflow with the pseudo-code as follows. Notice that f_k requires no gradient while $M$ and f_q require it.

(a) Given a mini batch of images x, we first generate mixed samples x_mix by the MixBlock $M$. The feature maps lat_f of $x$ are generated by the encoder k f_k and are fed to $M$ to generate mixed images m_q and m_k.

(b) Then we forward two encoders, f_q and f_k, to get classification logits. For the encoder f_k, we get logits_mix_k with the generated mixed images m_k to optimize the MixBlock $M$. For the encoder f_q, we calculate one-hot logits_cls_q with the original images and calculate logits_mix_q with m_q.

(c) We calculate mixup classification losses with the logits and mixed labels to get loss_cls and loss_gen. Notice that loss_gen optimizes the MixBlock $M$ by backward the gradient to $M$ through f_k, i.e., better-mixed images $m_k$ will conduct lower mixup classification losses.

(d) Finally, the parameters of f_q and $M$ are updated normally by the optimizer, while the parameters of f_k (no gradient) are momentum updated. Notice that f_k can simply copy the latest parameters of f_q, but it will cause unstable training of $M$ due to the frequent updating of f_k.

I hops this will help you. Please feel free to ask me or contact me by WeChat (Lupin_1998) if you have more questions.

Lupin1998 commented 1 year ago

I will close this issue if there is no more question. Please feel free to open a new issue when you have new questions.

Westlake-AI / openmixup

Questions about the Momentum Encoder and Mix Block #40