Closed lightsighter closed 4 years ago
We definitely need to do something here. However, I'm beginning to lean away from having so many processor types. It leads to trouble when an application mixes different programming models for its leaf asks. For example, an app with both native OpenMP and Kokkos leaf tasks would be forced to decide up front which CPU cores do OpenMP work and which do Kokkos work, when really the same cores could do either (just not at the same precise moment in time).
I agree with you. I would prefer that Kokkos run unmodified on top of our existing OpenMP and GPU processors. If it were a proper header-only library like Thrust or Agency then it wouldn't have any problems. Mainly we just need to use this issue to make sure that Kokkos works and doesn't trip over itself like it has been with the restrictions that we enforce upon it.
The kokkos branch (https://gitlab.com/StanfordLegion/legion/tree/kokkos) has the beginnings of support for this. I made an attempt at writing idiomatic Kokkos code here: https://gitlab.com/StanfordLegion/legion/blob/kokkos/examples/kokkos_saxpy/kokkos_saxpy.cc
and have successfully built and run it with Kokkos' Serial, OpenMP, and Cuda devices. To get a little confidence that this matches the way real people use Kokkos, I've rigged this example to use the BLAS routines from kokkos-kernels if you build with KOKKOS_KERNELS set to that package's installation location.
The build only works with the Makefile flow so far (no cmake yet), and requires the app's Makefile to set USE_KOKKOS=1 and KOKKOS_DIR to the Kokkos install directory (no support for building Kokkos as part of Legion). There are a number of limitations with the current version:
-ll:cpu 1
to avoid angering the external OpenMP runtime. (On the plus side, it looks like Kokkos might actually be fine with Realm's multiple OpenMP runtime instances once all the right entry points are added to Realm.)-ll:gpu 1
. You also can't use a CUDA-enabled Kokkos without a GPU.All that said, I think this is far enough along that if folks want to try building Kokkos-using apps against this, it's worth giving it a go. Now is also the time for folks to look things over and make comments/suggestions about how the interop looks.
What parts of OpenMP are not supported by Realm that Kokkos needs?
Also @ipdemes and @gshipman told me that they don't plan to use Kokkos external views with FleCSI, but instead pass in their own iterator types from FleCSI. They have tested this and apparently it works, but you will have to ask them for more details.
@streichler : should I use https://gitlab.com/StanfordLegion/legion/tree/kokkos branch for our further work with Kokkos? @lightsighter : Yes, we don't use Kokkos::Views in FleCSI and pass our own iterator types that were adapted for Kokkos.
@ipdemes yes, please give it a try and let me know how it goes... if you need these changes applied on top of some other branch (e.g. control_replication), that can be arranged
regarding flecsi's custom iterators, do you construct those directly from legion accessors, or convert the legion accessors to raw pointers and then into flecsi iterators?
Also, @ipdemes do custom iterators work on GPUs too?
@streichler : Yes, we need this to be applied to the Control-replication branch before I can try this.
regarding flecsi's custom iterators, do you construct those directly from legion accessors, or convert the legion accessors to raw pointers and then into flecsi iterators?
We get a raw pointer from legion and use it in our custom iterators.
@lightsighter : yes, we made them work for GPU as well. At least a simple unit test that uses them on GPU pass. I am working on making the test more complex
@ipdemes I've rebased the kokkos
branch to sit on top of control_replication
. I'll back-port things as needed to master later.
Status update: We're able to build against the latest Kokkos develop branch with any combination of OpenMP and CUDA support. Remaining work before it can be merged into master is to do a cleanup pass of the build flow. (Kokkos has completely changed their makefile-based flow, and the cmake flow has probably more warts than it needs.)
@streichler : thank you for an update.
Kokkos support has been merged into the master branch with commit 61cf62e. Note that it is only available through the cmake build, as Kokkos no longer supports building against an installed version via a Makefile.
It's become clear recently that we need explicit Kokkos processors for the FleCSI level one milestone next year as the majority of the GPU code for FleCSI is going to be written using Kokkos. This significantly deprioritizes #570 as VPSC is not going to be used for the FleCSI level one milestone.