Is the performance bottleneck of rknn llm in the CPU？

Caical commented 2 months ago

On the firefly board: The default operating mode of the CPU is interactive, with a frequency of 408000. The default operating mode of NPU is rknpu_ondemand, with a frequency of 1000000000. The default performance is approximately 12 tokens/s When I only changed the NPU running mode to userspaces, the Q&A speed did not improve. But when I changed the CPU to userspaces and increased the main frequency, the performance improved to 21 tokens/s May I ask why the performance bottleneck of rknn llm is on the CPU.

zhoujing07 commented 2 months ago

hi would you mind sharing what model you used?

Caical commented 2 months ago

rk3588 qwen1.8b

Pelochus commented 2 months ago

How do you measure tokens/s?

fydeos-alex commented 2 months ago

See this https://github.com/airockchip/rknn-llm/issues/27#issuecomment-2081778067 😊

airockchip / rknn-llm

Is the performance bottleneck of rknn llm in the CPU？ #29