about diversity softmax

For the category frequency vector p = (p₁, p₂, ..., p_n), p^λ is used in softmax to adjust the category distribution weights learned by the model. When λ is 0, the model weights learned from long-tail distribution data also follow a long-tail distribution. When λ equals 1, it reaches a balanced state, i.e., Balance Softmax. When λ is less than 0, it changes p^λ, making the model’s weights focus more on the head categories. You can plug in different values of λ to see how they affect the category frequency vector p and understand the relationships between them.

fistyee / MDCS

about diversity softmax #3