fistyee / MDCS

🔥MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition [Official, ICCV 2023]
26 stars 1 forks source link

about diversity softmax #3

Open lijm071 opened 1 week ago

lijm071 commented 1 week ago

I would like to ask when λ < 0, why does the expert model focus on the head categories? Shouldn't ”λ < 0“ lead to a decrease in the predicted probability of the head categories?

fistyee commented 5 days ago

For the category frequency vector p = (p1, p2, ..., pn), pλ is used in softmax to adjust the category distribution weights learned by the model. When λ is 0, the model weights learned from long-tail distribution data also follow a long-tail distribution. When λ equals 1, it reaches a balanced state, i.e., Balance Softmax. When λ is less than 0, it changes pλ, making the model’s weights focus more on the head categories. You can plug in different values of λ to see how they affect the category frequency vector p and understand the relationships between them.