LLNL / apollo

Apollo: Online Machine Learning for Performance Portability
Other
22 stars 9 forks source link

First stab at multitool implementation #18

Closed DavidPoliakoff closed 5 months ago

DavidPoliakoff commented 2 years ago

So far this is just a proposal, but we're looking to support a more ubiquitous tools model at Sandia. This is partially inspired by @ibaned , so I'll ping him here. Note, this description is ripped from one on the equivalent Caliper PR. Giorgis, you might care because with your awesome builtin ML library thing, this is how you eventually get Apollo to be baked into huge numbers of Sandia applications

In order to support that, we're going to essentially build a huge tools bundle, users can include Caliper, Apollo, APEX, TAU, whatever they want. A problem with this is that it means we might have multiple tools linked into the same library, all defining kokkosp_begin_parallel_for. Oops. Easy answer, just declare the symbol weak! Except now Caliper's kokkosp_begin_parallel_for gets replaced with Score-P's, which is, you know, suboptimal.

The solution I've come up with, is that there will be a weak, unnamespaced, kokkosp_begin_parallel_for, which just calls the namespaced apollo::kokkosp_begin_parallel_for (we can negotiate on the namespace name). There's also a function to return a Kokkos EventSet (note: this does not incur a dependency on a built Kokkos), which calls the correct apollo::kokkosp_begin_parallel_for.

Now, for our existing users, nothing really changes, they dlopen Apollo, it's the only kokkosp_begin_parallel_for in the app, they call it, which calls apollo::kokkosp_begin_parallel_for.

Meanwhile in the bundled situation, that bundle calls cali::get_event_set (and apollo::get_event_set, once it exists, and TAU, and, and, and...), and registers all of these tools with Kokkos. Then they can pick out a tool by name. On a user's side, they can write something like

auto tools_handle = BigOlKokkosToolsBundle::gimme_all_your_tools(); Kokkos::Tools::here_are_some_tools(tools_handle); Kokkos::Tools::activate_tool("apollo", "--flush-rate=5000"); Kokkos::initialize(argc, argv);

And then get the functionality they want. I'm putting this up so you can tell me if the code is just too atrocious for words, but I wouldn't merge it until you get a comment from @crtrott saying "yeah, this is the design we want," which might be after I leave. But let me know if you'd like code fixes

ggeorgakoudis commented 2 years ago

Hey David, the code is self-contained in the connector so any atrocity is permitted (though it does not look bad to my eyes) since I expect the Kokkos guys will maintain it :). The solution sounds reasonable to me and I don't see any conflicts right now using apollo as the name of the namespace.

I'd be delighted if Apollo is used a lot in Sandia applications and I would like to hear more on success stories. Please let me know when you would like me to accept the PR and feel free to introduce me to the maintainer succeeding you.

ggeorgakoudis commented 5 months ago

Closed due to obsolescence