support for multiple compute devices / MPI rank

Status quo

the original CUDA support was limited to 1 device / rank
418 relaxed the constraint that at most 1 device could be assigned to each rank (but more than 1 device could be visible to each rank) to be able to coexist with components that require such support, e.g. the emerging device support in https://github.com/TESSEorg/TTG

For completeness need to be able to drive multiple devices from single rank. For performance reasons many algorithms may still benefit from 1 device/rank mapping to improve data locality/reuse

ValeevGroup / tiledarray

support for multiple compute devices / MPI rank #422

418 relaxed the constraint that at most 1 device could be assigned to each rank (but more than 1 device could be visible to each rank) to be able to coexist with components that require such support, e.g. the emerging device support in https://github.com/TESSEorg/TTG