VainF / Torch-Pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning
https://arxiv.org/abs/2301.12900
MIT License
2.7k stars 332 forks source link

Something is confused with high level pruners? #267

Open aidevmin opened 1 year ago

aidevmin commented 1 year ago

Thanks @VainF for amazing repo.

I read your paper and see some API pruner, but something is confused with pruner.

  1. In this link, you said that BNScalePruner and GroupNormPruner supports sparse training. It means that we need to train pretrained model again with at least 1 epoch. It changes pretrained model parameters. Is that right?

  2. In this benchmark table https://github.com/VainF/Torch-Pruning/tree/master/benchmarks, I saw that some methods implemented by you such as Group-L1, Group-BN, Group-GReg, Ours w/o SL and Ours. As my understanding, all the above methods estimate importance of parameters:

    • All the above methods are group-level and they are different each other by only importance criteria? Is that right?
  3. Are all pruners in your repo group-level? I am confused because when I read the code, for example group-level L1 you used tp.pruner.MagnitudePruner, group-level BN you used tp.pruner.BNScalePruner, these 2 API pruner names are without Group. But for group-level Group pruner you used tp.pruner.GroupNormPruner with Group in the API name. Please correct me.

  4. Your contribution is DepGraph and new pruning method GroupPruner with sparse learning (based on L2 norm)? Is that right? If it is right, so GroupPruner without sparse learning is same as tp.pruner.MagnitudePruner with L2 importance?

  5. As my understanding, tp.pruner.MagnitudePruner is group-level for Conv layers, tp.pruner.BNScalePruner is group-level for BN layers, and tp.pruner.GroupNormPruner for Conv, BN, Linear layers. Is that right?

Sorry for my not good English.

VainF commented 1 year ago
  1. In this link, you said that BNScalePruner and GroupNormPruner supports sparse training. It means that we need to train pretrained model again with at least 1 epoch. It changes pretrained model parameters. Is that right?

Yes, it forces some unimportant parameters to be 0.

  1. In this benchmark table https://github.com/VainF/Torch-Pruning/tree/master/benchmarks, I saw that some methods implemented by you such as Group-L1, Group-BN, Group-GReg, Ours w/o SL and Ours. As my understanding, all the above methods estimate importance of parameters:

Yes.

  1. Are all pruners in your repo group-level? I am confused because when I read the code, for example group-level L1 you used tp.pruner.MagnitudePruner, group-level BN you used tp.pruner.BNScalePruner, these 2 API pruner names are without Group. But for group-level Group pruner you used tp.pruner.GroupNormPruner with Group in the API name. Please correct me.

Yes, all pruner is able to estimate group importance and remove grouped parameters by default.

  1. Your contribution is DepGraph and new pruning method GroupPruner with sparse learning (based on L2 norm)? Is that right? If it is right, so GroupPruner without sparse learning is same as tp.pruner.MagnitudePruner with L2 importance?

Right. Both GroupNormPruner and MagnitudePruner inherent tp.pruner.MetaPruner.The only difference is that GroupNormPruner has an interface for sparse training.

  1. As my understanding, tp.pruner.MagnitudePruner is group-level for Conv layers, tp.pruner.BNScalePruner is group-level for BN layers, and tp.pruner.GroupNormPruner for Conv, BN, Linear layers. Is that right?

Yes.

aidevmin commented 1 year ago

@VainF Thank you so much for quick response. I got it.

aidevmin commented 1 year ago

@VainF Do you have any recommed for number of epochs with sparse training? If it is large, so it take much time for normal training + sparse training before pruning.