Closed srikris-sridhar closed 1 year ago
Hi @srikris-sridhar,
Thank you for getting in touch. It's hard to know what might cause this. The Arm Compute CL backend (GPU) can take longer on the first iteration due to a warm up that it does. How many iterations are you running? If you set it to 10 for example, do you see an speed improvement on the second and following iterations?
Also, would you be able to supply us profiling data on your hardware with multiple iterations, so we can take a look at the specific kernels that are being run? If so, how are you running this model, so I can help with this further? Here are some general tips.
-e
or --event-based-profiling
to your command.enable-internal-profiling
option. See more at armnn/delegate/include/DelegateOptions.hpp
.Thanks again!
Kind regards,
Matthew
@matthewsloyanARM I've attached the models in the issue (see models.zip) so you should be able to reproduce it entirely. It's just a single 3x3 convolution. Are you able to reproduce this on your end?
I do run multiple iterations (~100) and chose the min time so it's likely not related to warm up.
Hi @srikris-sridhar,
This issue is most likely to be related with Arm Compute Library than with Arm NN: https://github.com/ARM-software/ComputeLibrary/issues, try to open an issue on their side, I think they will be able to help you better than us.
Kind Regards
I've filed an issue with ARM Compute Library.
I've tried running a really simple convolution (attached 2 models, one in FP32 and one in INT8). Here is what I see on an Samsung A33, Mali GPU G68. I see a good boost with INT8 on CPU but not GPU. Is this expected?
CPU (INT8): 2.5ms
CPU (FP32): 6.3ms
GPU (INT8): 7.2ms
GPU (FP32): 8.4ms
models.zip