arbor-sim / arbor

The Arbor multi-compartment neural network simulation library.
https://arbor-sim.org
BSD 3-Clause "New" or "Revised" License
110 stars 61 forks source link

Default environment support in arbor-env #988

Closed halfflat closed 3 years ago

halfflat commented 4 years ago

Goal: allow simple environment-based determination of resources for an execution context from arborenv.

The motivations are: reduce bolierplate for querying environment variables and threading environment; provide consistent environment variables for user code that takes advantage of this functionality; ease unit testing in multithreaded and GPU contexts.

Proposal:

In the future we can add a facility for marking unit tests that are specific to GPU or multithreaded functionality, so we can filter for them at invocation time.

Related: #982, #983

brenthuisman commented 3 years ago
  1. Python builds seem to ignore ARB_NUM_THREADS. Is that a bug?
  2. Does point 4 mean a change from 1-thread by default to std::thread::hardware_concurrency() by default?
bcumming commented 3 years ago

I would like to remove support for OMP_NUM_THREADS, having pushed for it multiple times in the past and been shot down.

Using environment variables to set GPU and thread count should be opt-in, like it is in the proposal, and accessed via something like arbenv::default_gpu(), arbenv::default_concurrency, etc.

We originally set the default thread count that according to the environment/OS, and removed it.

brenthuisman commented 3 years ago

What was the reason for moving back to a default thread count of 1?

std::thread::hardware_concurrency() is something we can trust, outside of MPI contexts.

brenthuisman commented 3 years ago

Strongly held opinion: by default I expect context() to default to the number of threads (on a local machine). Maybe context(all_cores=True)?

halfflat commented 3 years ago

To answer the question regarding std::thread::hardware_concurrency(), it's not as trustworthy as it looks, sadly.

A default value of one for the thread count is safe: it makes it much easier to compare performance across different systems, as automatic determination of available threads is always a gamble; and while in some contexts a user might want to use all available hardware threads, in others they may want to use only a subset (e.g. multiple MPI ranks per node, or to avoid SMT), or even oversubscribe. It avoids hard to debug and hard to reproduce system-specific issues with thread count determination.

When a user wishes to use more than one thread, it should be an active choice.