ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.82k stars 774 forks source link

Poor Thread Scaling Behaviour of SGEMM #1118

Closed FabianSchuetze closed 1 month ago

FabianSchuetze commented 3 months ago

It seem to me that the SGEMM kernel has poor scaling behaviour with more threads.

I am using the CPP scheduler for the example/neon_sgemm.cpp code. There seem to be a few issues:

What scaling behavior should I expect in general? Is there anything I can do to debug the issue?

morgolock commented 2 months ago

Hi @FabianSchuetze

The openmp scheduler will scale better both on Android and Linux. Could you please try building ACL with openmp=1 cppthreads=0

You should not have to set the number of threads, ACL will use the best number for your system

Hope this helps,