On the firefly board:
The default operating mode of the CPU is interactive, with a frequency of 408000. The default operating mode of NPU is rknpu_ondemand, with a frequency of 1000000000. The default performance is approximately 12 tokens/s
When I only changed the NPU running mode to userspaces, the Q&A speed did not improve.
But when I changed the CPU to userspaces and increased the main frequency, the performance improved to 21 tokens/s
May I ask why the performance bottleneck of rknn llm is on the CPU.
On the firefly board: The default operating mode of the CPU is interactive, with a frequency of 408000. The default operating mode of NPU is rknpu_ondemand, with a frequency of 1000000000. The default performance is approximately 12 tokens/s When I only changed the NPU running mode to userspaces, the Q&A speed did not improve. But when I changed the CPU to userspaces and increased the main frequency, the performance improved to 21 tokens/s May I ask why the performance bottleneck of rknn llm is on the CPU.