Closed CN-Wenbo closed 9 months ago
As stated in section 3, delta is the hyperparameter to control the annealing to avoid remaining bias. Comparison is based on the total saving ratio (with annealing considered).
In most cases, InfoBatch does not need to tune the parameter. This is the only case using different lr as stated, and we guess it is because R-18's loss surface is not as smooth as R-50.
We noticed that delta is set to 0.875 in example code. Is this hyperparameter used in paper's experiments, which means full dataset will be used in final epochs. Besides, in Appendix A "due to reduced steps, InfoBatch uses a learning rate of 0.05 in this setting", it seems unfair to increase lr.