SeoLabCornell / torch2chip

Torch2Chip (MLSys, 2024)
MIT License
48 stars 4 forks source link

Clarifications on QAT and _QBase code #1

Open vignesh99 opened 3 months ago

vignesh99 commented 3 months ago

Hello, Thank you for this library. I had a couple of questions regarding the code uploaded:

  1. Your paper mentions that both QAT and PTQ is possible and also shows results. However, your code does not provide documentation or trainers (like PTQViT) for QAT. In the vit.py file you employ a quantized model and perform training (which is the QAT procedure) but then your comments and print statements mentions it as PTQ. Please clarify how to do QAT for ViT using your code. https://github.com/SeoLabCornell/torch2chip/blob/61b3ef32fdd68f3d576c00bca1a4fc977ea1a18d/imagenet/vit.py#L127-L133
  2. In your file torch2chip/src/module/base.py, the _Qbase class has the attribute q() shown below. This function is an identity function and does not perform any quantization. However, this q() function is crucial since this is present in the trainFunc which is in turn present in forward method of _QBase. And _Qbase is the foundation for all the other attention and mlp quantizations. So it appears quantization is not happening as expected. Please let me know if I'm wrong. https://github.com/SeoLabCornell/torch2chip/blob/61b3ef32fdd68f3d576c00bca1a4fc977ea1a18d/src/module/base.py#L85-L89

Thanks!

mengjian0502 commented 3 months ago

Hi there,

Thank you for your question. The initialization of _QBase is served as the basic method for the subsequent custom quantization. If you check src/quantization/, you will find out that all the quantizers (with different methods) are constructed based on _QBase.

When you apply PTQ, the quantizers are assigned directly for either calibration or training.

The overall flow will be: Pretrained model -> Convert vanilla layer to the placeholder layer (Vanilla2Compress (link)) -> Assign Quantizer -> Quantize -> Save.

Regarding QAT, we are in the pipeline of adding QAT and checkpoints across different precisions. For your customization, you can also follow the workflow above, creating your quantizer on top of QAT (we will have the tutorial for that shortly).

Sorry for the confusion here; The complete documentation, example, and tutorial will be ready by mid-June.

vignesh99 commented 3 months ago

Thanks for the swift clarifications. I see the custom quantizer's q() function takes the place of _QBase's q() function.

For QAT clarification, isn't it already doing QAT in your code? Because you are training a quantized model (and I'm assuming the fake quantization is happening during training). PTQ does not involve any training. So I'm still confused.

These are my steps for doing QAT using your code: I take the quantized model as shown below. I use your PTQViT or SmoothQuantPTQViT to train the model. This should provide me QAT model right? If not please let me know the steps. https://github.com/SeoLabCornell/torch2chip/blob/main/imagenet/vit.py#L113-L114

Thanks

Z-KN commented 3 months ago

When you apply PTQ, the quantizers are assigned directly for either calibration or training.

Sorry I was a bit confused. Are you referring to static PTQ or dynamic PTQ? Static PTQ needs calibration https://pytorch.org/docs/stable/quantization.html#:~:text=possible.%20It%20requires-,calibration,-with%20a%20representative, while dynamic PTQ does not. But in any case, PTQ should not go through the training process anymore, right? Is there anything special under your context with respect to PTQ?

mengjian0502 commented 3 months ago

Hi there,

In terms of PTQ, we offer both calibration options and learning options. For instance, some recent works (e.g., QDrop) introduce learnable scaling factors and optimize them on the sampled calibration data. But the weight remains frozen in this process, and they call themself PTQ, which I think is "partially true" .

I was having the same questions as you, so I have to name all the quantization methods with "frozen weights" as PTQ, even though sometimes the quantization parameters (e.g., scaling and bias) are also learnable.

To make a fair comparison, for all the PTQ with learnable quantization parameters, I only train the quantization parameters for 1 epoch , and it is still pretty fast. However, for some other repos, the total number of training steps could be 20K just for quantization parameter training.

Z-KN commented 3 months ago

Thanks for your clarification. In this case, what if I only want to do dynamic PTQ? https://pytorch.org/docs/stable/quantization.html#:~:text=weights%20are%20quantized%20ahead%20of%20time%20but%20the%20activations%20are%20dynamically%20quantized%20during%20inference For weight quantization beforehand, we need to find the zero points and scaling factors of weights. Are you referring to this analysis process as "calibration"? Here obviously needs no training or learning.

vignesh99 commented 3 months ago

@mengjian0502 please also let me know your clarification for my comment