Nota-NetsPresso / BK-SDM

A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
Other
238 stars 16 forks source link

why training ...? #63

Open dreamyou070 opened 1 month ago

dreamyou070 commented 1 month ago

in the paper, you found the unimportant SD block/layer. In that case, you may not have to retrain the model (because if you erase unimportant block/layer, the performance is almost preserved)

Can you explain why you train again after erasing unimportant block ?

Thanks!

bokyeong1015 commented 1 month ago

Hi,

For low pruning ratios (which remove a small number of blocks), retraining may not be necessary, or light retraining (such as LoRA) would be enough. Refer to the example of mid-block removal without retraining in our paper.

For high pruning ratios (which remove a large number of blocks, including outer blocks), retraining is essential to compensate for the loss of information and to achieve satisfactory results.

Specifically, for structured pruning, we think that severe compression to achieve significant efficiency gains often necessitates heavy retraining.


These observations are further supported in our subsequent work, Shortened LLaMA: