Open GuideWsp opened 5 years ago
or maybe it just a quantizer method? Just for the code, I thought it as a good Quantizer Method.
In the released code I just used F.conv2d(input, self.weight ...)
to compare with the methods I had previously.
If you change that line to F.conv2d(input, self.intweight*self.alpha ...)
you should get the same result.
If you want to do the multiplications in int and then in FP32, you can do the conv using input*self.intweight and then multiply by self.alpha
When I have time I will try to add a demo of that
I also have several doubts: 1) Are all layers(including first and last layers) quantized in your experiments? 2) is the activation also quantized in your experiments? 3) why there is no experiment on the performance comparison between your method and the model quantization methods, which has tremendous papers which can also reduce the memory a lot, since your method also used quantization. 4) Could you please provide the training code to reproduce the performance results mentioned in your paper? @MarceloGennari
Hi Brisker,
Thanks for your interest in the method.
So, I have quantized all conv layers as of now (not the FC layer at the end, even though I intend to do that soon).
and 4. I will release soon (hopefully by tomorrow) the code used to perform the experiments that I mentioned in the paper. As reported as of now in the paper, the activation has been done in pytorch using FP32. Since the idea was to have the DSConv to run on an FPGA, the idea was to use the mantissa of the FP32 activation as a fixed-point representation to be multiplied by the quantized convolution kernels - hence having fixed point multiplication. This is not explained in detail in the paper yet and I understand it's caused confusion, but I am updating it hopefully soon as well. I will try to release the code to update the VQK as well, since I heard from my colleagues that there is a straight-forward way of doing that, which gets a little bit on the training part.
Yes, some of my colleagues raised the issue that I haven't compared that much with other recent quantization models. Do you have any suggestions against which methods I sure compare my model?
@MarceloGennari you can compare with : [1] Zhang D, Yang J, Ye D, et al. Lq-nets: Learned quantization for highly accurate and compact deep neural networks[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 365-382. [2] Choi J, Wang Z, Venkataramani S, et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks[J]. arXiv preprint arXiv:1805.06085, 2018.
Besides, if you compare with other methods, I am not sure whether it is a fair comparison, given that basically quantizating conv layers(k,k,cin,cout)
will introduce cout
float parameters, but in your methods, you seem to introduce more than cout
parameters.
@MarceloGennari
transform_activation
function here: https://github.com/ActiveVisionLab/DSConv/blob/master/complete_module/resnet.py#L80 ?Hi @brisker
After your comment asking me to compare the method with the papers you pointed out, I am taking some time to review the training procedure for both activation and weights. As you can see in the code I was experimenting with quantizing the activations (that's the difference between the complete_module directory and the modules directory) as you pointed out. It was a bit rushed so that's why it is a bit messy. Some other colleagues pointed out that I should use some other ways of training that should achieve better accuracy / be cleaner and that I should review and include some things in the paper including the method used in the code for quantizing activations (the results in the paper are FP32).
Thanks for your input in here! I am now taking some time to update everything (hopefully the paper as well with all the new training and testing) and should update the code here as well.
Hi Marcelo! I read your paper, it is very interesting! but I still have no idea to use test it, so, can you provide a training and test demo, thanks a lot !
I read your paper Yesterday. In your paper, you had said that: it is a variation of the Conv layer and can replaced the standard Conv. As the Fig.3 show, I think it does. But from the code in the github, I can't find the right forward demo, it just has the weight convert demo. To now, I have no idea to use test it, so will you provide a full test demo and test the readers to know how to use the intweight, alpha, KDSb, CDSw, CDSb, etc.
Hoping to your Reply.