ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

Threading in ACL #996

Closed allnes closed 1 year ago

allnes commented 1 year ago

Hello, I have some questions:

morgolock commented 1 year ago

Hi @allnes

ACL implements multithreading at the kernel level in CPPScheduler, see https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/CPP/CPPScheduler.cpp and https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/IScheduler.cpp#L146 . At runtime the scheduler creates multiple threads and executes the same kernel on different parts of the data. Internally the library uses the concept of executing a kernel in a window,

All this happens automatically when you run a function like NEGEMM, the function will configure the kernels and call the CPPScheduler to execute them in multiple threads. See https://arm-software.github.io/ComputeLibrary/latest/implementation_topic.xhtml#implementation_topic_multithreading

ACL does not pin the threads to cores.

Hope this helps.

allnes commented 1 year ago

Hi @allnes

ACL implements multithreading at the kernel level in CPPScheduler, see https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/CPP/CPPScheduler.cpp and https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/IScheduler.cpp#L146 . At runtime the scheduler creates multiple threads and executes the same kernel on different parts of the data. Internally the library uses the concept of executing a kernel in a window,

All this happens automatically when you run a function like NEGEMM, the function will configure the kernels and call the CPPScheduler to execute them in multiple threads. See https://arm-software.github.io/ComputeLibrary/latest/implementation_topic.xhtml#implementation_topic_multithreading

ACL does not pin the threads to cores.

Hope this helps.

Thank you for your answer!