ArmNN INT8 GPU performance isn't significantly better than FP16

srikris-sridhar commented 1 year ago

I've tried running a really simple convolution (attached 2 models, one in FP32 and one in INT8). Here is what I see on an Samsung A33, Mali GPU G68. I see a good boost with INT8 on CPU but not GPU. Is this expected?

CPU (INT8): 2.5ms
CPU (FP32): 6.3ms
GPU (INT8): 7.2ms
GPU (FP32): 8.4ms

models.zip

matthewsloyanARM commented 1 year ago

Hi @srikris-sridhar,

Thank you for getting in touch. It's hard to know what might cause this. The Arm Compute CL backend (GPU) can take longer on the first iteration due to a warm up that it does. How many iterations are you running? If you set it to 10 for example, do you see an speed improvement on the second and following iterations?

Also, would you be able to supply us profiling data on your hardware with multiple iterations, so we can take a look at the specific kernels that are being run? If so, how are you running this model, so I can help with this further? Here are some general tips.

If using Execute Network to execute your model you can just add -e or --event-based-profiling to your command.
If using the Arm NN TFLite Delegate you can add the enable-internal-profiling option. See more at armnn/delegate/include/DelegateOptions.hpp.

Thanks again!

Kind regards,

Matthew

srikris-sridhar commented 1 year ago

@matthewsloyanARM I've attached the models in the issue (see models.zip) so you should be able to reproduce it entirely. It's just a single 3x3 convolution. Are you able to reproduce this on your end?

I do run multiple iterations (~100) and chose the min time so it's likely not related to warm up.

TeresaARM commented 1 year ago

Hi @srikris-sridhar,

This issue is most likely to be related with Arm Compute Library than with Arm NN: https://github.com/ARM-software/ComputeLibrary/issues, try to open an issue on their side, I think they will be able to help you better than us.

Kind Regards

srikris-sridhar commented 1 year ago

I've filed an issue with ARM Compute Library.

ARM-software / armnn

ArmNN INT8 GPU performance isn't significantly better than FP16 #718