Some questions about the experiments.

ShadeAlsha / LTR-weight-balancing

CVPR 2022 - official implementation for "Long-Tailed Recognition via Weight Balancing" https://arxiv.org/abs/2203.14197

MIT License

118 stars 10 forks source link

Some questions about the experiments. #3

Open Z-ZHHH opened 2 years ago

Z-ZHHH commented 2 years ago

In the code, you use the ResNet34 model for CIFAR100-LT, but in the paper, you use the ResNet32 model.

dataset	Stage	loss	base lr	schedular	batch	epoch	WD	model	result_all
CIFAR100-100	stage1	CE	0.01	Coslr	64	320	0.005	ResNet32	40.1
CIFAR100-100	stage1	CE	0.01	Coslr	64	320	0.005	ResNet34	47.3

I use these settings to train the model and get a bad result (7% lower than the open source experiment in CoLab), could you please point out my problem？

Could you provide more details about the experiment on how to choose the proper weight decay value for long-tailed recognition? It would help a lot.
I experiment several methods including MisLAS, BAMLS. I find that 5e-4 is good enough and tuning weight decay improves the performance slightly. Maybe the tuning of weight decay is not the core point of imbalanced learning?

ShadeAlsha commented 2 years ago

It is hard to see the difference without seeing your implementation. But can you run my notebook to see if it reproduces results. One notable difference is that my code uses ResNet34 while you used ResNet32. I apologize for the inconsistent writing and code.
We use Bayesian Optimization to search for the weight decay, as explained in Section 3.3. You might find this implementation useful.
While WD tuning atop of MisLAS/BAMLS does not improve much, it seems to suggest the latter methods regularize training in a similar way to WD. Therefore, it is hard to say if WD is (not) the core point of imbalanced learning.

rahulvigneswaran commented 2 years ago

Also, the number of epochs is given as 200 in the paper while you have used 320 in the implementation. Can you confirm which is the correct one to replicate the results?

ShadeAlsha commented 2 years ago

Yes, we used 200 epochs only for the first stage of training to produce the results in the paper.

Z-ZHHH commented 2 years ago

I train ResNet34 with the identical setting, the result is identical to the CoLab experiment. The weight decay 0.005 is not the optimal value for ResNet32 (ResNet32 feature dim=64 while ResNet34 feature dim=512).	dataset	Stage	loss	base lr	schedular	batch	epoch	WD	model	result_all
CIFAR100-100	stage1	CE	0.01	Coslr	64	320	0.005	ResNet34	47.28

The work is really great. It would be better and easy to compare with other long-tailed methods if the CoLab experiment is trained with the ResNet32 model.

By the way, what is the proper weight decay value for ResNet32 on CIFAR100-100 dataset in your experiment? In the Fig.4 of the paper, I think the experiment is training with ResNet32 and it seems that 0.005 is the proper weight decay value for ResNet32 on CIFAR100-100 dataset.