Closed AshimaBisla closed 3 months ago
If you running llama2 in the same way like you are running on NPU, just by changing the --target
then it should work.
NPU run
python run_awq.py --task decode --target aie --w_bit 4
CPU run
python run_awq.py --task decode --target cpu --w_bit 4
Thanks for the reply @uday610 I am running the exact same command for CPU, but I am not able to generate any output even after running it for couple of hours. Can you help me to know how long it usually takes to run Llam2 model with 4 bit AQW quantization on CPU?
This 1.1 flow is obsoleted because we have flow with 1.2 now. Hence closing this issue
Hello,
I am trying to run Llama2 model on AMD by accelerating it on CPU, but it keeps on running for more than an hour and gets stuck at "warmup" stage. For AIE it runs within couple of minutes. Is there any way to speed up the process of execution for CPU or has anyone executed it on CPU who can tell how many hours did it actually take to run the model dedicatedly on CPU?
Thanks, Ashima