Low token/sec results with IPU for Llama2 than expected

amd / RyzenAI-SW

MIT License

294 stars 47 forks source link

Low token/sec results with IPU for Llama2 than expected #100

Closed AshimaBisla closed 2 weeks ago

AshimaBisla commented 3 weeks ago

Hello, While running Llam2 on AMD's IPU, I got the following results in CSV file log_awq_7B_chat_profile.csv . Here, in the file, tokens/sec are approximately 2.5 for all the inputs. I want to understand that are such results optimal for IPU because I think they are quite less than expected. Can you help clarify this doubt? Also does anyone has any benchmark results with them to compare my results with?

Thanks, Ashima

uday610 commented 2 weeks ago

The llama2 result in PHX is between 3 and 4 tokens per sec. However, the result can vary depending on factors like CPU, BIOS, etc. Next-generation architecture, such as Strix Point, coming soon, will offer more performance.