Closed AboveParadise closed 4 months ago
Add --real_quant
in your command to achieve real quantization, and add --save_dir SAVE_PATH
in your command to save the quantized models.
You can also see https://github.com/OpenGVLab/OmniQuant/blob/main/runing_falcon180b_on_single_a100_80g.ipynb or https://github.com/OpenGVLab/OmniQuant/blob/main/runing_quantized_mixtral_7bx8.ipynb for more details about running the really quantized models.
Add
--real_quant
in your command to achieve real quantization, and add--save_dir SAVE_PATH
in your command to save the quantized models. You can also see https://github.com/OpenGVLab/OmniQuant/blob/main/runing_falcon180b_on_single_a100_80g.ipynb or https://github.com/OpenGVLab/OmniQuant/blob/main/runing_quantized_mixtral_7bx8.ipynb for more details about running the really quantized models.
Thanks for your reply! So your work can actually reduce the GPU memory usage, right?
Yes, with the --real_quant
, OmniQuant can actually reduce the memory footprint.
I have already installed AutoGPTQ, what is the next step?