Closed AshimaBisla closed 4 months ago
The llama2 result in PHX is between 3 and 4 tokens per sec. However, the result can vary depending on factors like CPU, BIOS, etc. Next-generation architecture, such as Strix Point, coming soon, will offer more performance.
Hello, While running Llam2 on AMD's IPU, I got the following results in CSV file log_awq_7B_chat_profile.csv . Here, in the file, tokens/sec are approximately 2.5 for all the inputs. I want to understand that are such results optimal for IPU because I think they are quite less than expected. Can you help clarify this doubt? Also does anyone has any benchmark results with them to compare my results with?
Thanks, Ashima