counting ops - Githubissues

Hey Shawn,

Welcome to the compression world and thanks for reaching out!

Regarding the memory footprint, as detailed in our paper (Section 4.1, paragraph "metrics"), it is calculated as the indexing cost (number of bits per weight) plus the overhead of storing the centroids in float16. As an example, quantizing a layer of size 128 × 128 × 3 × 3 with k = 256 centroids (1 byte per subvector) and a block size of d = 9 leads to an indexing cost of 16 KB for m = 16, 384 blocks plus the cost of storing the centroids of 4.5 KB. Applying this to every layer of the ResNet-50 should give you the numbers we report in the paper.

Regarding the flops needed, it depends on the implementation. Our implementation is rather a proof of concept (we re-instantiate the non-compressed layer at inference time). Future work include writing a more clever way to perform inference (for example, you can perform the computation of input activations multiplied by every centroid and sum up the results accordingly). This should reduce the number of flops, and any contribution is more than welcomed!

Pierre

facebookresearch / kill-the-bits

counting ops #11