Closed sugar-fly closed 10 months ago
I think it does give a slight improvement, stepping later gives a little boost. The baseline in the paper is already with that improvement compared to DKM. I think its about 0.5 points or so.
Thank you for your answer!
Hi Johan,
I really appreciate your great work RoMa. I found that the training strategy of RoMa (i.e., scheduler) is different from DKM. Does using different training strategies help with performance?
Thank you so much for your help!