Thanks so much for the well-written paper.
I want to share these ideas with you.
(1) I see that you compress all the layers with the same factor. Based on some other compression techniques that I used such as pruning (thresholding), bayesian pruning and matrix factorization, if you compress the last layers much more than the first layers, the accuracy might be much better.
(2) Research question:
Actually, we design a very complex architecture and after that, we design more complex compression techniques to reduce the size. Is it possible to interpret deeply these compression techniques and extract some patterns which could help us design good architecture from the beginning?
Thanks to comment
DeeperDeeper
Thanks so much for your feedback! Regarding your ideas:
Adapting the compression ratio (i.e. the number of bits per weight) is a natural and interesting follow-up, it can be done either by hand (as you mention, compress more the last layers) or using some frameworks of RL.
The research question is more broad and obviously I don't have the answer to this one! From my little understanding, it is still helpful to train complex/deep/large architectures are they behave more gently towards SGD (SGD has "more chance" of getting to a good minima when the network is large) and then compress them. Of course, with other optimization techniques one could start from much more simple/already quantized networks.
Thanks so much for the well-written paper. I want to share these ideas with you. (1) I see that you compress all the layers with the same
factor
. Based on some other compression techniques that I used such as pruning (thresholding), bayesian pruning and matrix factorization, if you compress the last layers much more than the first layers, the accuracy might be much better.(2) Research question: Actually, we design a very complex architecture and after that, we design more complex compression techniques to reduce the size. Is it possible to interpret deeply these compression techniques and extract some patterns which could help us design good architecture from the beginning? Thanks to comment DeeperDeeper