Open Ta-SIR opened 3 years ago
That is the point, r1 loss is the case for sparsity. This work first train without r1 loss in main, then add it in sss. If we train with r1 loss in the beginning, probably we can not get a better accuracy. Its just like warm up in other algorithm.
That is the point, r1 loss is the case for sparsity. This work first train without r1 loss in main, then add it in sss. If we train with r1 loss in the beginning, probably we can not get a better accuracy. Its just like warm up in other algorithm.