ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
2.87k stars 783 forks source link

Poor Thread Scaling Behaviour of SGEMM #1118

Closed FabianSchuetze closed 3 months ago

FabianSchuetze commented 4 months ago

It seem to me that the SGEMM kernel has poor scaling behaviour with more threads.

I am using the CPP scheduler for the example/neon_sgemm.cpp code. There seem to be a few issues:

What scaling behavior should I expect in general? Is there anything I can do to debug the issue?

morgolock commented 4 months ago

Hi @FabianSchuetze

The openmp scheduler will scale better both on Android and Linux. Could you please try building ACL with openmp=1 cppthreads=0

You should not have to set the number of threads, ACL will use the best number for your system

Hope this helps,