Closed Rahn80643 closed 1 year ago
6010440 clock cycles / 0.038s indicates 158169473 clocks/sec ie. roughly 158Mhz, is that the correct clock speed of the NPU in your system?
Hmm, from the kernel messages, 1829.964914-1829.957455 = 0.007459 ie. 7.5ms, is the actual length of inference on the NPU. And 6010440 clocks / 0.0075s = 801,392,000 or roughly 800Mhz, which might sound more reasonable.
Also from the kernel messages, 17ms (1829.957455 - 1829.940628) was spent resetting the NPU (could this be due to waking it up from a low power state?)
I would suggest running several inferences in quick succession as there should be less time spent "waking things up" after the first inference.
The ArmNN "event" profiler will just measure elapsed time, the same as std::chrono. @eleanorbonnici-arm are you able to help further?
Thank you for your question. The recommended frequency for the NPU hardware is 1 GHz so 800 MHz looks close to real hardware performance. Depending on the platform you're running the inference on this might be a meaningful number.
Let us know if that helps
Hi @Rahn80643, do you require further assistance? Otherwise I will close this ticket. Thank you
HI,
I'm trying to evaluate the execution speed of ArmNN inference on a NPU, I added the code snippet of
profilerManager
according toProfilerTests.cpp
:For comparison, I also added
std::chrono::high_resolution_clock::now()
to evaluate the execution time, but the execution times from armnn profiler and std::chrono are close.from profiler: 38.12 ms from chrono: 38.576 ms
As far as I'm concerned, std::chrono evaluates the time from CPU, and it might be unable to evaluate the execution time of NPU, and armnn profiler could be used to evaluate the execution time on NPU or GPU.
I want to ask are the times evaluated above are reasonable for NPU? Is there other functions could be used to evaluate the performance of NPU?
Best Regards, Rahn