ActiveVisionLab / DSConv

50 stars 6 forks source link

Some doubt about DSConv. #1

Open GuideWsp opened 5 years ago

GuideWsp commented 5 years ago

I read your paper Yesterday. In your paper, you had said that: it is a variation of the Conv layer and can replaced the standard Conv. As the Fig.3 show, I think it does. But from the code in the github, I can't find the right forward demo, it just has the weight convert demo. To now, I have no idea to use test it, so will you provide a full test demo and test the readers to know how to use the intweight, alpha, KDSb, CDSw, CDSb, etc. Hoping to your Reply.

GuideWsp commented 5 years ago

or maybe it just a quantizer method? Just for the code, I thought it as a good Quantizer Method.

MarceloGennari commented 5 years ago

In the released code I just used F.conv2d(input, self.weight ...) to compare with the methods I had previously.

If you change that line to F.conv2d(input, self.intweight*self.alpha ...) you should get the same result.

If you want to do the multiplications in int and then in FP32, you can do the conv using input*self.intweight and then multiply by self.alpha

When I have time I will try to add a demo of that

brisker commented 5 years ago

I also have several doubts: 1) Are all layers(including first and last layers) quantized in your experiments? 2) is the activation also quantized in your experiments? 3) why there is no experiment on the performance comparison between your method and the model quantization methods, which has tremendous papers which can also reduce the memory a lot, since your method also used quantization. 4) Could you please provide the training code to reproduce the performance results mentioned in your paper? @MarceloGennari

MarceloGennari commented 5 years ago

Hi Brisker,

Thanks for your interest in the method.

  1. So, I have quantized all conv layers as of now (not the FC layer at the end, even though I intend to do that soon).

  2. and 4. I will release soon (hopefully by tomorrow) the code used to perform the experiments that I mentioned in the paper. As reported as of now in the paper, the activation has been done in pytorch using FP32. Since the idea was to have the DSConv to run on an FPGA, the idea was to use the mantissa of the FP32 activation as a fixed-point representation to be multiplied by the quantized convolution kernels - hence having fixed point multiplication. This is not explained in detail in the paper yet and I understand it's caused confusion, but I am updating it hopefully soon as well. I will try to release the code to update the VQK as well, since I heard from my colleagues that there is a straight-forward way of doing that, which gets a little bit on the training part.

  3. Yes, some of my colleagues raised the issue that I haven't compared that much with other recent quantization models. Do you have any suggestions against which methods I sure compare my model?

brisker commented 5 years ago

@MarceloGennari you can compare with : [1] Zhang D, Yang J, Ye D, et al. Lq-nets: Learned quantization for highly accurate and compact deep neural networks[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 365-382. [2] Choi J, Wang Z, Venkataramani S, et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks[J]. arXiv preprint arXiv:1805.06085, 2018.

Besides, if you compare with other methods, I am not sure whether it is a fair comparison, given that basically quantizating conv layers(k,k,cin,cout) will introduce cout float parameters, but in your methods, you seem to introduce more than coutparameters.

brisker commented 5 years ago

@MarceloGennari

  1. when will you release more experiment codes(training code)?
  2. To put it straight, can I say that, the activations in your paper is NOT quantized? If it is not quantized, why there is transform_activation function here: https://github.com/ActiveVisionLab/DSConv/blob/master/complete_module/resnet.py#L80 ?
MarceloGennari commented 5 years ago

Hi @brisker

After your comment asking me to compare the method with the papers you pointed out, I am taking some time to review the training procedure for both activation and weights. As you can see in the code I was experimenting with quantizing the activations (that's the difference between the complete_module directory and the modules directory) as you pointed out. It was a bit rushed so that's why it is a bit messy. Some other colleagues pointed out that I should use some other ways of training that should achieve better accuracy / be cleaner and that I should review and include some things in the paper including the method used in the code for quantizing activations (the results in the paper are FP32).

Thanks for your input in here! I am now taking some time to update everything (hopefully the paper as well with all the new training and testing) and should update the code here as well.

joey001-art commented 1 year ago

Hi Marcelo! I read your paper, it is very interesting! but I still have no idea to use test it, so, can you provide a training and test demo, thanks a lot !