ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

Use of `std::thread::hardware_concurrency()` does not take into account it may return `0` as the number of threads. #988

Closed matt-gretton-dann closed 1 year ago

matt-gretton-dann commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version':

arm_compute_version=22.05 Build options: {'arch': 'armv8.2-a', 'debug': '0', 'examples': '1', 'opencl': '0', 'mali': '0', 'neon': '1', 'cppthreads': '1', 'benchmark_examples': 'true', 'validate_examples': 'true'} Git hash=b'a175e887d64450decf80ea47d4049832c5805565'

Platform: VM under macOS M1

Operating System:

Ubuntu 22.04

Problem description:

Looking through Compute Library's code base to understand how it decides how many threads to use I noticed that it doesn't guard against std::thread::hardware_concurrency() returning 0 - which is valid according to the C++ standard (reference C++17 thread.thread.static - https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency).

Could not provide a use case as the systems I am on don't trigger the error as they use other methods and not the fallback, but this is a potential issue as far as I can see.

matt-gretton-dann commented 1 year ago

Looking at https://github.com/llvm/llvm-project/blob/main/libcxx/src/thread.cpp this may affect any platform which uses libc++ except for Windows or POSIX.

morgolock commented 1 year ago

Hi @matt-gretton-dann

This should not be a problem as we guard against 0 in https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/CPP/CPPScheduler.cpp#L473

The scheduler won't run the workload when num_threads < 1

465 void CPPScheduler::run_workloads(std::vector<IScheduler::Workload> &workloads)
466 {
467     // Mutex to ensure other threads won't interfere with the setup of the current thread's workloads
468     // Other thread's workloads will be scheduled after the current thread's workloads have finished
469     // This is not great because different threads workloads won't run in parallel but at least they
470     // won't interfere each other and deadlock.
471     arm_compute::lock_guard<std::mutex> lock(_impl->_run_workloads_mutex);
472     const unsigned int                  num_threads_to_use = std::min(_impl->num_threads(), static_cast<unsigned int>(workloads.size()));
473     if(num_threads_to_use < 1)
474     {
475         return;
476     }

If on your system std::thread::hardware_concurrency() returns 0 you can override this value by calling void CPPScheduler::set_num_threads(unsigned int num_threads)

https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/CPP/CPPScheduler.cpp#L445

Hope this helps.