Empty cluster still remain

facebookresearch / kill-the-bits

Code for: "And the bit goes down: Revisiting the quantization of neural networks"

Other

636 stars 124 forks source link

Empty cluster still remain #7

Closed tuanho27 closed 5 years ago

tuanho27 commented 5 years ago

Hi there, I've tried to reproduce this project on EfficientNet but it seems that the advance method to re-initialize the centroid to avoid empty clusters does not help for this case (depthwise convolutions) even I set the iteration to 1 million and decrease the number of centroids but still can't solve.

Have your team met this issue before and is there any idea for that?

Thank you,

pierrestock commented 5 years ago

Hi tuanho27,

Yes this issue is due to (1) the fact that we sample activations and that one EM iteration can literally kick one centroid out of reach and (2) the fact that the weights in the depthwise convolutions are sometimes not that balanced.

You can try the following to circumvent the issue:

reduce the number of centroids (thus less chance for an empty cluster)
reduce the block size (more subvectors to quantize hence less chance for an empty cluster)
play with the $\varepsilon$ parameter for cluster re-assignment
skip this layer for quantization (quite radical, but often this phenomenon affects the lower layers, that weight at most some dozens of KB)

Please reach out for any other questions

tuanho27 commented 5 years ago

Thank you @pierrestock for your suggestion,

I see, I already test the above first three approaches and it didn't work till now, it seems that we need to finetune these factors a lot when it comes to another network, notably with depth convolution. I will test more several cases, hope it can overcome; otherwise, the only way is to skip those layers.

By the way, with reference to mask-rcnn experiment, the paper points out that you use the constant centroids = 256 for all layers, could you explain a bit more about this setting? If the code will be released in the future, it will be great.

Best Regards,

pierrestock commented 5 years ago

I also plan to test those cases in the near future, maybe another way to re-assign empty clusters would work.

For the mask-rcnn, I am still cleaning the code. The number of centroids per layer is stored in the compressed model. To be more precise, all the layers are quantized with k=256 centroids, except for the bbox_pred, cls_logits and mask_fcn_logits layers that are not quantized (see quantize.py in the mask_r_cnn branch). The block size is 4 for pointwise convolutions and 9 for standard convolutions. Some large FC layers are quantized with a larger block size (16 if I recall correctly).

tuanho27 commented 5 years ago

Thank you for a quick reply. There are not too many ways to re-assign empty clusters, you already do the best job when assigned to the most common solved-clusters with an epsilon value, but let find out it if possible. About mask-rcnn, I saw the code, it's clear, just wonder why you choose 256 centroids rather than 128 or 512 or the other, it results in the best mAP after quantizing, isn't it?

pierrestock commented 5 years ago

The result for k=512 centroids would indeed get better mAP.

However, when choosing k=256 centroids, we can store the cluster assignments over 1 byte (from 0 to 255, see PyTorch byte format), thus the scheme is more hardware-friendly if we were to design fast inference functions.