Open 5g4s opened 1 year ago
When pruned with low pruning rates (e.g., 0.2), both dense-to-sparse training and sparse-to-sparse training can easily recover from pruning. On the contrary, if too many parameters are removed at one time, almost all models suffer from accuracy drops.
When pruning happens during the training phase with large learning rates, models can easily recover from pruning (up to a certain level). However, pruning plasticity drops significantly after the second learning rate decay, leading to a situation where the pruned networks can not recover with continued training.
GraNet starts from a denser yet still sparse model and gradually prunes the sparse model to the desired sparsity.
https://arxiv.org/abs/2106.10404