IST-DASLab / QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference
Apache License 2.0
167 stars 12 forks source link

[Question] How to get act_scales of custom llama-like model? How much calibration data items do we need? Need act_zeros simultaneously? #10

Open hanrui1sensetime opened 9 months ago

hanrui1sensetime commented 9 months ago

We want to try QUIK on our self-implemented llama-like model weights. We found that may be there is no script about how to generate act_scales .pt files. So we use calibration data items to quant activation and save it first? How much items should we use, and need act_zeros too?

I'm looking forward to the reply soon. Thanks.

hanrui1sensetime commented 9 months ago

I have found the script in SmoothQuant repo. So I will close the issue.

VityaVitalich commented 8 months ago

I have found the script in SmoothQuant repo. So I will close the issue.

Could you please share how the problem was solved?

VityaVitalich commented 8 months ago

Found it quite easily. For the further interest, one could find the code for scales generation here. All the needed info is described in README

https://github.com/mit-han-lab/smoothquant/tree/main