Closed FabianSchuetze closed 3 months ago
Hi @FabianSchuetze
The openmp scheduler will scale better both on Android and Linux. Could you please try building ACL with openmp=1 cppthreads=0
You should not have to set the number of threads, ACL will use the best number for your system
Hope this helps,
It seem to me that the SGEMM kernel has poor scaling behaviour with more threads.
I am using the CPP scheduler for the
example/neon_sgemm.cpp
code. There seem to be a few issues:scheduler.num_threads_hint()
reports1
Increasing the number of threads manually does not offer good performance improvements, see the table below:
The relevant output from the logs is:
Using openmp scheduler yields worse performance. My device consist of X3, A715 and A510 processors (Samsung S9 Tablet)
What scaling behavior should I expect in general? Is there anything I can do to debug the issue?