Closed una-dinosauria closed 4 years ago
Hi Julieta,
Thanks a lot for your thorough analysis, which is very helpful. Let me answer to your questions below:
fp32
arrays per BatchNorm layer as described here, instead of 4 arrays (running mean and variance, alpha and beta).fp32
.In our next paper, we use int8
to store the non-quantized layers, and store the centroids in int8
but the important thing is to compare apples to apples. Unfortunately, I must have forgotten to count some biases for both the semi-supervised and Mask R-CNN networks and I agree with your calculations. I will correct the paper (together with the previous issue #38).
Thanks you again for your interest and your work, looking forward to see your future research!
Pierre
Thanks again for your fast response, Pierre.
I have created a small PR to change the README to match the updated size of the semisupervised ResNet-50 over here: https://github.com/facebookresearch/kill-the-bits/pull/40
Regarding the Mask-RCNN part, would you mind keeping this issue open until the paper is updated?
Cheers,
Thanks Julieta. Of course I can keep the issue open until the paper is updated and I will ping you when it's done.
Wishing you good luck for the ICLR incoming reviews (if any) and for all future papers!
Pierre
Hi @pierrestock,
Sorry to bother you again; there is one last aspect we've had trouble reproducing from the paper -- the memory taken by the MaskRCNN model, which is reported as
6.51 MB
, with a 26x compression factor: We also have a smaller discrepancy with the unsupervised resnet50 model, reported as5.15 MB
in the README:We have followed the paper, and counted all the codebooks as being stored in
float16
format:However, the paper does not say which encoding is used by the
bnorm
layers or other layers ignored for the purpose of compression. To reproduce the results on Imagenet, we have usedfloat32
encoding for these two cases. This gives us the following results:There are however, two small discrepances:
5.15 MB
while we get5.20 MB
, and6.51 MB
while we get6.65 MB
-- this matches the number that we report in our paper.We have tried counting the uncompressed and bnorm layers as
float16
, but that also gives different results than those reported in the paper:with similar results for the semi-supervised resnet50.
So, my question is: could you please explain how you obtained the model size for mask-rcnn and the unsupervised resnet50?
I have put together a gist that makes it easy to see how we computed our numbers: https://gist.github.com/una-dinosauria/e528b91de3ca9ab108cbf00aba3d9c2a. Please do make sure to run this on the
mask_r_cnn
branch of the codebase, as themask_r_cnn.pth
model is missing all the compressed biases onmaster
.Thank you in advance, Julieta