why training ...? - Githubissues

Hi,

For low pruning ratios (which remove a small number of blocks), retraining may not be necessary, or light retraining (such as LoRA) would be enough. Refer to the example of mid-block removal without retraining in our paper.

For high pruning ratios (which remove a large number of blocks, including outer blocks), retraining is essential to compensate for the loss of information and to achieve satisfactory results.

Specifically, for structured pruning, we think that severe compression to achieve significant efficiency gains often necessitates heavy retraining.

Nevertheless, retraining over a pruned network yields faster and better convergence compared to training the same size network from scratch using random weights.

These observations are further supported in our subsequent work, Shortened LLaMA:

Low pruning ratio: Light LoRA retraining would be enough.
High pruning ratio: Full-parameter finetuning on the pretraining corpus is necessary for good results.

Nota-NetsPresso / BK-SDM

why training ...? #63