[Conceptual question] Question about self-distillation

Thank you for your interest in our work.

If you point the distill_checkpoint argument to a checkpoint (RIDE with 6 experts), RIDE will be end-to-end optimized with distillation loss, which could bring about 0.4%-1% improvements (different experiments may observe differently). For now, we have not provided any pre-trained checkpoints, therefore, you may have to pre-train a RIDE model with 6 experts as the teacher model. We will update the model zoo recently. You can download the teacher model there once it is updated.
Yes, you may need to enable distillation to reproduce the reported results in the paper. We also provided the results of RIDE without distilling with ResNeXt50 as a backbone in model zoo. We tried it locally before releasing this codebase, we could reproduce the reported results in the model zoo with this reorganized codebase.

Please let us know if you meet any further questions reproducing the results.

frank-xwang / RIDE-LongTailRecognition