Closed shenyann closed 5 years ago
Hey Shawn,
Welcome to the compression world and thanks for reaching out!
Regarding the memory footprint, as detailed in our paper (Section 4.1, paragraph "metrics"), it is calculated as the indexing cost (number of bits per weight) plus the overhead of storing the centroids in float16
. As an example, quantizing a layer of size 128 × 128 × 3 × 3 with k = 256
centroids (1 byte per subvector) and a block size of d = 9
leads to an indexing cost of 16 KB for m = 16, 384
blocks plus the cost of storing the centroids of 4.5 KB. Applying this to every layer of the ResNet-50 should give you the numbers we report in the paper.
Regarding the flops needed, it depends on the implementation. Our implementation is rather a proof of concept (we re-instantiate the non-compressed layer at inference time). Future work include writing a more clever way to perform inference (for example, you can perform the computation of input activations multiplied by every centroid and sum up the results accordingly). This should reduce the number of flops, and any contribution is more than welcomed!
Pierre
Hey,
Thanks for sharing the amazing work! I'm a new bird to network compression and want to know how to compute the flops and network parameters in your quantized model. I tried a counter toolkit but it reports the same number of parameters and flops as regular ResNet-50 does (when I use the pre-trained compressed ResNet-50).
Thx, Shawn