Make a common library of GPU enabled containers and algorithms

MrBurmark commented 1 year ago

Make a new library above RAJA with common views, containers without dynamic storage, and sequential algorithms that work on GPUs, a lot like this https://github.com/nvidia/libcudacxx. These are mainly things that exist in the std library like std::array but that we can't use in device code because they are not marked host device. Another way to think of these are things that don't take an exec policy like seq_exec/cuda_exec. The places that make sense to add these things are camp https://github.com/LLNL/camp or DESUL https://github.com/desul/desul.

Things to add to this library.

Stuff from the std library a. array b. vector? c. span d. mdspan e. sort f. scan g. binary search h. math functions (abs, min, max, sqrt, ...) i.
Error handling from Brandon
Stuff from the cuda std library (https://nvidia.github.io/libcudacxx/)
...

Other things to think about.

Try to put host device requirements into the type system. a. Consider having host, host device, and device versions of stuff. b. This could allow some seq/par requirements to be checked at compile-time in a GPU build to some extent.

MrBurmark commented 1 year ago

@trws Here's an idea to potentially reduce code duplication across projects by expanding camp to have more containers/views and algorithms that are commonly used in device code.

adayton1 commented 1 year ago

These are the things we are currently use or would use. We have implementations of almost all of these in CARE.

Containers: (If needed, these could be views except for array)

Something like std::array
Something like std::vector (we don't use push_back)
Something like std::map or std::flat_map that gives binary search capability
Something like std::set or std::flat_set that gives binary search capability

Algorithms that act on scalars:

abs (I know there is fabs, but I would like a templated version that always does the right thing)
max/min (again there are fmin and fmax, but a templated version would be great, especially an initializer list and/or variadic template version to handle any number of scalars) - we definitely want something to get rid of the MAX and MIN macros that many codes have defined.
swap
copysign
Templated versions of a lot of math functions would be nice, but perhaps that is out of the scope or belongs somewhere else (maybe all of these except swap belong somewhere else)

Algorithms that act on arrays (note that these are at the level of a single thread, not launching kernels, so "sequential" I guess):

min/max/minmax
find/search
copy
is_sorted
binary_search
lower_bound
upper_bound
sort
unique

Algorithms that act on arrays and do launch kernels:

fill
copy
min/max/minmax
find/search
count
iota
accumulate
inclusive/exclusive_scan
is_sorted
sort
unique
compress (I'm not sure if there is a std algorithm equivalent, but basically given an array and a list of indices, makes a new array with only the selected indices or everything but the given indices)
sort_pairs

There's probably other algorithms that I'm missing, but this is a pretty core set.

LLNL / RAJA

Make a common library of GPU enabled containers and algorithms #1539