Closed unira-zwj closed 1 year ago
Hi, @rovebot, thanks for your question. Sorry for the late reply. As for the training process of AutoMix, I provide a simplified workflow with the pseudo-code as follows. Notice that f_k
requires no gradient while $M$ and f_q
require it.
(a) Given a mini batch of images x, we first generate mixed samples x_mix
by the MixBlock $M$. The feature maps lat_f
of $x$ are generated by the encoder k f_k
and are fed to $M$ to generate mixed images m_q
and m_k
.
(b) Then we forward two encoders, f_q
and f_k
, to get classification logits. For the encoder f_k
, we get logits_mix_k
with the generated mixed images m_k
to optimize the MixBlock $M$. For the encoder f_q
, we calculate one-hot logits_cls_q
with the original images and calculate logits_mix_q
with m_q
.
(c) We calculate mixup classification losses with the logits and mixed labels to get loss_cls
and loss_gen
. Notice that loss_gen
optimizes the MixBlock $M$ by backward the gradient to $M$ through f_k
, i.e., better-mixed images $m_k$ will conduct lower mixup classification losses.
(d) Finally, the parameters of f_q
and $M$ are updated normally by the optimizer, while the parameters of f_k
(no gradient) are momentum updated. Notice that f_k
can simply copy the latest parameters of f_q
, but it will cause unstable training of $M$ due to the frequent updating of f_k
.
I hops this will help you. Please feel free to ask me or contact me by WeChat (Lupin_1998
) if you have more questions.
I will close this issue if there is no more question. Please feel free to open a new issue when you have new questions.
Dear Author,
You are doing a great job! I have read your paper carefully, but I still can't understand what is the supervision information of the two network modules: momentum encoder and mix block, can you explain it?
Thank you!