Why does the model size remain the same? After the IMP algorithm, how to calculate the model‘s FPS?

Hi haodonga I working on this "problem". I have three things two say for you:

This happens because the Iterative Magnitude Pruning, or the Lottery Tickets Hypothesis, and also other algorithms in the literature, as the SuperMask or the SynapticFlow, are both non-structured pruning. There is, it generates sparse matrices, differently from structured pruning, which removes convolutional channels for example. These sparse matrices are generated creating a mask, with zeros and ones, which multiply the parameters and zero the pruned ones, which answers your question: the model size doesn't change because, at this step, you have only a theoretical reduction, setting many parameters as zero. They still are there, but with values equal to zero.
I working with PyTorch, and at least currently, I don't find an official PyTorch way to solve this problem. We can store sparse matrices efficiently, saving the parameters as a list and the indexes as a list of tensors. We can even perform matrix multiplication between a sparse tensor (the pruned parameters) and a dense tensor (the input) in an efficient manner that reduces the number of operations. However, we can't perform a convolutional inference without using the function to_dense() on the sparse parameters, which inflates this tensor to the original shape and fill with zeros on the missing values. This can reduce the storage size, however, can't reduce the inference cost.
Thus, I create the class FullSparseYOLO. This class receives an already pruned Darknet or SoftDarknet, and creates a new model, replacing the convolutional layers by my M2MSparseConv. The class M2MSparseConv receives a pruned conv layer, reshape the parameters as a 2D tensor, and stores it as a sparse tensor. Receiving input on the forward function, it reshapes the input as a 2D tensor in a correct way that we can perform the convolution inference as a matrix multiplication between a sparse and a dense tensor. In this way, we can also save the computational effort on the inference. In the current code, you can use computing_flops.py (sorry by the script name hehe) to compute the MACs of the FullSparseYOLO, and in storage.py you can save this model. Unfortunately, I discovered this trick of sparse tensor storing only as a Tensor, not as a Parameter. Consequently, I cannot use the FullSparseYOLO on graphs.py to save the model on TensorBoard, neither save the model with the traditional PyTorch way, this is why I create storage.py. If you discover some way to create/store a sparse Parameter, please contact me.

AndreydeAguiarSalvi / yolo_compression

Why does the model size remain the same? After the IMP algorithm, how to calculate the model‘s FPS? #1