Llama2 execution on AMD CPU - not getting any results

amd / RyzenAI-SW

MIT License

403 stars 65 forks source link

Llama2 execution on AMD CPU - not getting any results #101

Closed AshimaBisla closed 3 months ago

AshimaBisla commented 4 months ago

Hello,

I am trying to run Llama2 model on AMD by accelerating it on CPU, but it keeps on running for more than an hour and gets stuck at "warmup" stage. For AIE it runs within couple of minutes. Is there any way to speed up the process of execution for CPU or has anyone executed it on CPU who can tell how many hours did it actually take to run the model dedicatedly on CPU?

Thanks, Ashima

uday610 commented 4 months ago

If you running llama2 in the same way like you are running on NPU, just by changing the --target then it should work.

NPU run python run_awq.py --task decode --target aie --w_bit 4 CPU run python run_awq.py --task decode --target cpu --w_bit 4

AshimaBisla commented 4 months ago

Thanks for the reply @uday610 I am running the exact same command for CPU, but I am not able to generate any output even after running it for couple of hours. Can you help me to know how long it usually takes to run Llam2 model with 4 bit AQW quantization on CPU?

uday610 commented 3 months ago

This 1.1 flow is obsoleted because we have flow with 1.2 now. Hence closing this issue