StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
675 stars 145 forks source link

Realm: Kokkos Processors #645

Closed lightsighter closed 4 years ago

lightsighter commented 4 years ago

It's become clear recently that we need explicit Kokkos processors for the FleCSI level one milestone next year as the majority of the GPU code for FleCSI is going to be written using Kokkos. This significantly deprioritizes #570 as VPSC is not going to be used for the FleCSI level one milestone.

streichler commented 4 years ago

We definitely need to do something here. However, I'm beginning to lean away from having so many processor types. It leads to trouble when an application mixes different programming models for its leaf asks. For example, an app with both native OpenMP and Kokkos leaf tasks would be forced to decide up front which CPU cores do OpenMP work and which do Kokkos work, when really the same cores could do either (just not at the same precise moment in time).

lightsighter commented 4 years ago

I agree with you. I would prefer that Kokkos run unmodified on top of our existing OpenMP and GPU processors. If it were a proper header-only library like Thrust or Agency then it wouldn't have any problems. Mainly we just need to use this issue to make sure that Kokkos works and doesn't trip over itself like it has been with the restrictions that we enforce upon it.

streichler commented 4 years ago

The kokkos branch (https://gitlab.com/StanfordLegion/legion/tree/kokkos) has the beginnings of support for this. I made an attempt at writing idiomatic Kokkos code here: https://gitlab.com/StanfordLegion/legion/blob/kokkos/examples/kokkos_saxpy/kokkos_saxpy.cc

and have successfully built and run it with Kokkos' Serial, OpenMP, and Cuda devices. To get a little confidence that this matches the way real people use Kokkos, I've rigged this example to use the BLAS routines from kokkos-kernels if you build with KOKKOS_KERNELS set to that package's installation location.

The build only works with the Makefile flow so far (no cmake yet), and requires the app's Makefile to set USE_KOKKOS=1 and KOKKOS_DIR to the Kokkos install directory (no support for building Kokkos as part of Legion). There are a number of limitations with the current version:

All that said, I think this is far enough along that if folks want to try building Kokkos-using apps against this, it's worth giving it a go. Now is also the time for folks to look things over and make comments/suggestions about how the interop looks.

lightsighter commented 4 years ago

What parts of OpenMP are not supported by Realm that Kokkos needs?

Also @ipdemes and @gshipman told me that they don't plan to use Kokkos external views with FleCSI, but instead pass in their own iterator types from FleCSI. They have tested this and apparently it works, but you will have to ask them for more details.

ipdemes commented 4 years ago

@streichler : should I use https://gitlab.com/StanfordLegion/legion/tree/kokkos branch for our further work with Kokkos? @lightsighter : Yes, we don't use Kokkos::Views in FleCSI and pass our own iterator types that were adapted for Kokkos.

streichler commented 4 years ago

@ipdemes yes, please give it a try and let me know how it goes... if you need these changes applied on top of some other branch (e.g. control_replication), that can be arranged

regarding flecsi's custom iterators, do you construct those directly from legion accessors, or convert the legion accessors to raw pointers and then into flecsi iterators?

lightsighter commented 4 years ago

Also, @ipdemes do custom iterators work on GPUs too?

ipdemes commented 4 years ago

@streichler : Yes, we need this to be applied to the Control-replication branch before I can try this.

regarding flecsi's custom iterators, do you construct those directly from legion accessors, or convert the legion accessors to raw pointers and then into flecsi iterators?

We get a raw pointer from legion and use it in our custom iterators.

@lightsighter : yes, we made them work for GPU as well. At least a simple unit test that uses them on GPU pass. I am working on making the test more complex

streichler commented 4 years ago

@ipdemes I've rebased the kokkos branch to sit on top of control_replication. I'll back-port things as needed to master later.

streichler commented 4 years ago

Status update: We're able to build against the latest Kokkos develop branch with any combination of OpenMP and CUDA support. Remaining work before it can be merged into master is to do a cleanup pass of the build flow. (Kokkos has completely changed their makefile-based flow, and the cmake flow has probably more warts than it needs.)

ipdemes commented 4 years ago

@streichler : thank you for an update.

streichler commented 4 years ago

Kokkos support has been merged into the master branch with commit 61cf62e. Note that it is only available through the cmake build, as Kokkos no longer supports building against an installed version via a Makefile.