How to apply INT8 quantization on Llama2 model?

amd / RyzenAI-SW

MIT License

403 stars 65 forks source link

Closed AshimaBisla closed 4 months ago

AshimaBisla commented 4 months ago

Hello,

I have tried to run Llama2 model with 3 bit and 4 bit quantization. But is there a way that I can apply and run INT 8 quantized Llama2 model on AMD?

Regards, Ashima

uday610 commented 4 months ago

We have not published INT8 script for llama2, but it could be applied the same way as the OPT example. In OPT, we published a script of W8A8.