Default environment support in arbor-env

halfflat commented 4 years ago

Goal: allow simple environment-based determination of resources for an execution context from arborenv.

The motivations are: reduce bolierplate for querying environment variables and threading environment; provide consistent environment variables for user code that takes advantage of this functionality; ease unit testing in multithreaded and GPU contexts.

Proposal:

We keep the current arbenv resource functions.
We remove OMP_NUM_THREADS from the environment check.
ARB_NUM_THREADS renamed to ARBENV_NUM_THREADS to make it clear it is a functionality from the arbenv library.
Add arbenv::default_concurrency that wraps the environment-check-or-else-thread-count-from-system code.
Add support for an environment variable ARBENV_GPU_ID; if set, and we have GPU support, and it’s within the device count, we use that value in arbenv::default_gpu(). A value <0 would effectively mean: do not use the GPU.
We use this in the unit tests and maybe the examples. As we fix up our GPU test coverage, we can work towards tests being GPU-agnostic, but which will try to use the GPU if available and we run tests multiple times in CI with multithreading and/or GPU enabled via environment.

In the future we can add a facility for marking unit tests that are specific to GPU or multithreaded functionality, so we can filter for them at invocation time.

Related: #982, #983

brenthuisman commented 3 years ago

Python builds seem to ignore ARB_NUM_THREADS. Is that a bug?
Does point 4 mean a change from 1-thread by default to std::thread::hardware_concurrency() by default?

bcumming commented 3 years ago

I would like to remove support for OMP_NUM_THREADS, having pushed for it multiple times in the past and been shot down.

Using environment variables to set GPU and thread count should be opt-in, like it is in the proposal, and accessed via something like arbenv::default_gpu(), arbenv::default_concurrency, etc.

We originally set the default thread count that according to the environment/OS, and removed it.

brenthuisman commented 3 years ago

What was the reason for moving back to a default thread count of 1?

std::thread::hardware_concurrency() is something we can trust, outside of MPI contexts.

brenthuisman commented 3 years ago

Strongly held opinion: by default I expect context() to default to the number of threads (on a local machine). Maybe context(all_cores=True)?

halfflat commented 3 years ago

To answer the question regarding std::thread::hardware_concurrency(), it's not as trustworthy as it looks, sadly.

A default value of one for the thread count is safe: it makes it much easier to compare performance across different systems, as automatic determination of available threads is always a gamble; and while in some contexts a user might want to use all available hardware threads, in others they may want to use only a subset (e.g. multiple MPI ranks per node, or to avoid SMT), or even oversubscribe. It avoids hard to debug and hard to reproduce system-specific issues with thread count determination.

When a user wishes to use more than one thread, it should be an active choice.

arbor-sim / arbor

Default environment support in arbor-env #988