Question about sparsity and pruning of the model

WayneDW / Bayesian-Sparse-Deep-Learning

Code for An Adaptive Empirical Bayesian Method for Sparse Deep Learning (NeurIPS'19)

MIT License

18 stars 7 forks source link

Hello, I met some confusions after reading your paper, hope to get the answers, thank you very much! 1. The "pruning" operation in the code seems to be just setting the weight parameter to zero without actually deleting the network neurons. So if I save a model with 90% sparse 27K parameters, the actual size of the model should be the same as that of the non sparse model, right? So, there is no way to get a smaller model with only 27K parameters here. 2. If we just set the weight parameter to zero and do not delete the corresponding neurons, then these neurons will still exist when the next model parameters are updated, and won't it cause these "zeroed" neurons to regain non-zero values? The puzzle is that if the neurons are not deleted during pruning, will the neurons that have been set to zero in the next training get parameters again and take effect? For example, if the weight of some neurons is set to zero, what we hope is that these neurons will lose their function and will not be updated in the next training, right? What puzzles me is that these zeroed neurons have not been deleted, and won't it continue to take effect due to the new weight value obtained by the model updating process? I hope you will forgive me if there are any unconscious offenses. I look forward to hearing from you! Thank you very much again.

Hi, Meteor,

I am very glad that you are interested in this paper. Your questions are very representative!

For the first question, yes. There are two ways of pruning, either pruning neurons or weights, where the first choice leads to more structural accelerations and the second one accepts a larger sparsity. setting weights to 0 alone cannot make the model smaller and we need to reformulate the model based on the structure based on a sparse model. For example, for a sparse matrix, it still has the same size as the dense one, but if you changed it to compressed row storage, then the size is greatly reduced.

Second. We prune the weights during the pruning, it will be active (non-zero) again, so in the final iteration, we also need to prune it to make sure sparsity.

Third. You didn't ask this question, but I will answer it for extending this work. Pruning based on probability is more appealing and wasn't achieved during the submission of the NeurIPS'19 paper. You can achieve it by forgetting about the inverse gamma prior and beta prior for the updates of \sigma^2 and \delta in Eq.(14-15) in the paper.

Hope these answers are helpful.

Hi, Meteor,

I am very glad that you are interested in this paper. Your questions are very representative!

For the first question, yes. There are two ways of pruning, either pruning neurons or weights, where the first choice leads to more structural accelerations and the second one accepts a larger sparsity. setting weights to 0 alone cannot make the model smaller and we need to reformulate the model based on the structure based on a sparse model. For example, for a sparse matrix, it still has the same size as the dense one, but if you changed it to compressed row storage, then the size is greatly reduced.

Second. We prune the weights during the pruning, it will be active (non-zero) again, so in the final iteration, we also need to prune it to make sure sparsity.

Third. You didn't ask this question, but I will answer it for extending this work. Pruning based on probability is more appealing and wasn't achieved during the submission of the NeurIPS'19 paper. You can achieve it by forgetting about the inverse gamma prior and beta prior for the updates of \sigma^2 and \delta in Eq.(14-15) in the paper.

Hope these answers are helpful.

Hi, thank you very much for your reply, which solved my confusion and gave me great inspiration. The work of this paper is great. If there are other questions about the paper, I will come back. Thank you very much again! Best wishes!

WayneDW / Bayesian-Sparse-Deep-Learning

Question about sparsity and pruning of the model #4