Closed tuanho27 closed 5 years ago
Hi tuanho27,
Yes this issue is due to (1) the fact that we sample activations and that one EM iteration can literally kick one centroid out of reach and (2) the fact that the weights in the depthwise convolutions are sometimes not that balanced.
You can try the following to circumvent the issue:
Please reach out for any other questions
Thank you @pierrestock for your suggestion,
I see, I already test the above first three approaches and it didn't work till now, it seems that we need to finetune these factors a lot when it comes to another network, notably with depth convolution. I will test more several cases, hope it can overcome; otherwise, the only way is to skip those layers.
By the way, with reference to mask-rcnn experiment, the paper points out that you use the constant centroids = 256 for all layers, could you explain a bit more about this setting? If the code will be released in the future, it will be great.
Best Regards,
I also plan to test those cases in the near future, maybe another way to re-assign empty clusters would work.
For the mask-rcnn, I am still cleaning the code. The number of centroids per layer is stored in the compressed model. To be more precise, all the layers are quantized with k=256 centroids, except for the bbox_pred
, cls_logits
and mask_fcn_logits
layers that are not quantized (see quantize.py
in the mask_r_cnn
branch). The block size is 4 for pointwise convolutions and 9 for standard convolutions. Some large FC layers are quantized with a larger block size (16 if I recall correctly).
Thank you for a quick reply. There are not too many ways to re-assign empty clusters, you already do the best job when assigned to the most common solved-clusters with an epsilon value, but let find out it if possible. About mask-rcnn, I saw the code, it's clear, just wonder why you choose 256 centroids rather than 128 or 512 or the other, it results in the best mAP after quantizing, isn't it?
The result for k=512 centroids would indeed get better mAP.
However, when choosing k=256 centroids, we can store the cluster assignments over 1 byte (from 0 to 255, see PyTorch byte
format), thus the scheme is more hardware-friendly if we were to design fast inference functions.
Hi there, I've tried to reproduce this project on EfficientNet but it seems that the advance method to re-initialize the centroid to avoid empty clusters does not help for this case (depthwise convolutions) even I set the iteration to 1 million and decrease the number of centroids but still can't solve.
Have your team met this issue before and is there any idea for that?
Thank you,