Mentioned in the paper:Through this scaling strategy, the absolute value of logits for old categories is
reduced, and the absolute value of logits for new ones is enlarged, thus forcing
the model Ft to produce larger logits for old categories and smaller logits for
new categories. Why can a reduction in the absolute value of logits of the old class force the model Ft to produce a larger logits of the old class❓
Mentioned in the paper:Through this scaling strategy, the absolute value of logits for old categories is reduced, and the absolute value of logits for new ones is enlarged, thus forcing the model Ft to produce larger logits for old categories and smaller logits for new categories. Why can a reduction in the absolute value of logits of the old class force the model Ft to produce a larger logits of the old class❓