DingXiaoH / GSM-SGD

Global Sparse Momentum SGD for pruning very deep neural networks
MIT License
43 stars 6 forks source link

About Pruning and Computation #1

Closed tigereatsheep closed 4 years ago

tigereatsheep commented 4 years ago

First thank you for your awesome work. I have a 2 questions:

  1. The pruning method in your paper is per weight, could it be configured to per kernel?
  2. Wether computation comparison experiments have been carried out?
DingXiaoH commented 4 years ago

Thank you for being interested in our work! Yes, it can be used for per-kernel pruning if you compute and apply the mask matrix by kernel-wise aggregation and broadcasting. Connection pruning (i.e., sparsification) is focused on reducing the number of non-zero params, not the number of FLOPs, unless we run the model on some specialized software and hardware platforms.

tigereatsheep commented 4 years ago

thks