Closed jingzhengli closed 3 years ago
Hi! Thanks for your attention. For your question, note that the optimization of objective (1) takes alternating steps (cf. the main text). The second term in Eq. (1) only relates to the derivation of the auxiliary distribution (cf. [14] in the main text), i.e., the first step of auxiliary distribution update. As for the second step of network update, it minimizes the KL divergence between the predictive label distribution of the network and the updated auxiliary one. By using the optimized auxiliary distribution as the target, the model is indeed encouraged to learn balanced clusters.
Hi! Thanks for your attention. For your question, note that the optimization of objective (1) takes alternating steps (cf. the main text). The second term in Eq. (1) only relates to the derivation of the auxiliary distribution (cf. [14] in the main text), i.e., the first step of auxiliary distribution update. As for the second step of network update, it minimizes the KL divergence between the predictive label distribution of the network and the updated auxiliary one. By using the optimized auxiliary distribution as the target, the model is indeed encouraged to learn balanced clusters.
Got it, thanks~
Hi, I didn't find the second term in EQ.(1) in your code, i.e., encouraging cluster size balance. Could you tell me where it is? thanks