Is there a quantization version support in the future?

VinAIResearch / PhoGPT

PhoGPT: Generative Pre-training for Vietnamese (2023)

Apache License 2.0

739 stars 67 forks source link

Is there a quantization version support in the future? #11

Closed phatjkk closed 7 months ago

phatjkk commented 9 months ago

I hope that PhoGPT will have an AWQ or GPTQ version for running on low VRAM GPUs. Do you have any plans for quantization? It will make this LLM more popular for students and individual researchers who have limited computing resources. Thank you for your hard work!

datquocnguyen commented 8 months ago

Is there any problem of employing: https://huggingface.co/docs/transformers/quantization#quantization and/or: https://github.com/casper-hansen/AutoAWQ with limited computing resources?

phatjkk commented 8 months ago

Yeah I have used AutoAWQ version 0.1.7 (which support Bloom) for quantization PhoGPT-7B5-Instruct to 4 bit GEMM group-size 128 and found this error:

The AutoAWQ working well with vietcuna-7b-v3 (which used Bloom too).

Here is the code I have used: https://github.com/phatjkk/test_llm/blob/main/FINALL_GENERATOR_Quantization.ipynb

I don't know what the problem is. I tried some previous version of transformer but may not work! Are you try it yet? Thank you!

datquocnguyen commented 8 months ago

Not sure why Bloom is mentioned here. This issue was fixed for MPT-type models (including PhoGPT): https://github.com/casper-hansen/AutoAWQ/issues/155 I will spend time looking into it.