AndreydeAguiarSalvi / yolo_compression

GNU General Public License v3.0
4 stars 3 forks source link

Why does the model size remain the same? After the IMP algorithm, how to calculate the model‘s FPS? #1

Closed haodonga closed 3 years ago

haodonga commented 3 years ago

我按照您的方法训练了自己的COCO格式数据集并进行IMP,经历过20%的13次IMP之后,虽然参数数量减少,但是其模型大小并无变化(240mb=240mb)这是为什么呢?并且,请问该如何测算模型的FPS?

I trained in accordance with the method of your own COCO format data sets and IMP, having gone through 20% 13 IMP, although reducing the number of parameters, but it does not change the model size (240mb = 240mb) Why is this? And, how do I measure the FPS of the model? thank you for your reply

AndreydeAguiarSalvi commented 3 years ago

Hi haodonga I working on this "problem". I have three things two say for you:

  1. This happens because the Iterative Magnitude Pruning, or the Lottery Tickets Hypothesis, and also other algorithms in the literature, as the SuperMask or the SynapticFlow, are both non-structured pruning. There is, it generates sparse matrices, differently from structured pruning, which removes convolutional channels for example. These sparse matrices are generated creating a mask, with zeros and ones, which multiply the parameters and zero the pruned ones, which answers your question: the model size doesn't change because, at this step, you have only a theoretical reduction, setting many parameters as zero. They still are there, but with values equal to zero.

  2. I working with PyTorch, and at least currently, I don't find an official PyTorch way to solve this problem. We can store sparse matrices efficiently, saving the parameters as a list and the indexes as a list of tensors. We can even perform matrix multiplication between a sparse tensor (the pruned parameters) and a dense tensor (the input) in an efficient manner that reduces the number of operations. However, we can't perform a convolutional inference without using the function to_dense() on the sparse parameters, which inflates this tensor to the original shape and fill with zeros on the missing values. This can reduce the storage size, however, can't reduce the inference cost.

  3. Thus, I create the class FullSparseYOLO. This class receives an already pruned Darknet or SoftDarknet, and creates a new model, replacing the convolutional layers by my M2MSparseConv. The class M2MSparseConv receives a pruned conv layer, reshape the parameters as a 2D tensor, and stores it as a sparse tensor. Receiving input on the forward function, it reshapes the input as a 2D tensor in a correct way that we can perform the convolution inference as a matrix multiplication between a sparse and a dense tensor. In this way, we can also save the computational effort on the inference. In the current code, you can use computing_flops.py (sorry by the script name hehe) to compute the MACs of the FullSparseYOLO, and in storage.py you can save this model. Unfortunately, I discovered this trick of sparse tensor storing only as a Tensor, not as a Parameter. Consequently, I cannot use the FullSparseYOLO on graphs.py to save the model on TensorBoard, neither save the model with the traditional PyTorch way, this is why I create storage.py. If you discover some way to create/store a sparse Parameter, please contact me.