The mismatch between your LDAM implementation and the original one

frank-xwang / RIDE-LongTailRecognition

[ICLR 2021 Spotlight] Code release for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

MIT License

261 stars 26 forks source link

Thanks for your question for this.

I noticed that LDAM multiplies the value by s, and we have it here: https://github.com/frank-xwang/RIDE-LongTailRecognition/blob/main/model/ldam_drw_resnets/ride_resnet_cifar.py#L175.

It's only an implementation detail that leads to the same computation, so we have the same LDAM computation when you compare to the LDAM codebase you present.

I believe that in LDAM the value is multiplied to adjust effect normalization causes. However, this is not a focus of us because we use LDAM as a base loss and ensuring the implementation is the same is enough for us. I believe the LDAM authors will present you with a good answer.

frank-xwang / RIDE-LongTailRecognition

The mismatch between your LDAM implementation and the original one #4