Closed AshimaBisla closed 4 months ago
Hello,
I have tried to run Llama2 model with 3 bit and 4 bit quantization. But is there a way that I can apply and run INT 8 quantized Llama2 model on AMD?
Regards, Ashima
We have not published INT8 script for llama2, but it could be applied the same way as the OPT example. In OPT, we published a script of W8A8.
Hello,
I have tried to run Llama2 model with 3 bit and 4 bit quantization. But is there a way that I can apply and run INT 8 quantized Llama2 model on AMD?
Regards, Ashima