LLNL / RAJA

RAJA Performance Portability Layer (C++)
BSD 3-Clause "New" or "Revised" License
486 stars 102 forks source link

Make a common library of GPU enabled containers and algorithms #1539

Open MrBurmark opened 1 year ago

MrBurmark commented 1 year ago

Make a new library above RAJA with common views, containers without dynamic storage, and sequential algorithms that work on GPUs, a lot like this https://github.com/nvidia/libcudacxx. These are mainly things that exist in the std library like std::array but that we can't use in device code because they are not marked host device. Another way to think of these are things that don't take an exec policy like seq_exec/cuda_exec. The places that make sense to add these things are camp https://github.com/LLNL/camp or DESUL https://github.com/desul/desul.

Things to add to this library.

  1. Stuff from the std library a. array b. vector? c. span d. mdspan e. sort f. scan g. binary search h. math functions (abs, min, max, sqrt, ...) i.
  2. Error handling from Brandon
  3. Stuff from the cuda std library (https://nvidia.github.io/libcudacxx/)
  4. ...

Other things to think about.

  1. Try to put host device requirements into the type system. a. Consider having host, host device, and device versions of stuff. b. This could allow some seq/par requirements to be checked at compile-time in a GPU build to some extent.
MrBurmark commented 1 year ago

@trws Here's an idea to potentially reduce code duplication across projects by expanding camp to have more containers/views and algorithms that are commonly used in device code.

adayton1 commented 1 year ago

These are the things we are currently use or would use. We have implementations of almost all of these in CARE.

Containers: (If needed, these could be views except for array)

Algorithms that act on scalars:

Algorithms that act on arrays (note that these are at the level of a single thread, not launching kernels, so "sequential" I guess):

Algorithms that act on arrays and do launch kernels:

There's probably other algorithms that I'm missing, but this is a pretty core set.