Open BradWhitlock opened 1 year ago
@BradWhitlock I think having everything in Axom work for serial execution without RAJA enabled would be useful. However, also think that for any parallel execution, we should require RAJA and use it for that. Wrapping RAJA stuff in the Axom namespace would make code maintenance easier, I think, because it would eliminate most of the need for macro guards in the code; i.e., they could be localized.
Of course, others on the team should weigh in on these decisions. This would be a good discussion topic for the next Axom meeting. Agree?
Thanks @BradWhitlock -- I agree with @rhornung67
Adding a non-RAJA path for serial execution might also make it easier to debug the code in some cases.
Another point to consider, the internal implementations of things in RAJA, like reductions, are very complex for GPU back-ends in particular. They have been highly tuned based on compiler, back-end programming model, memory usage, and other issues. It would not be wise to try to develop and maintain Axom versions of such things, IMO.
Also, RAJA has a new reduction interface (available in v2022.10.x releases) that is more flexible and offers performance benefits that the original reduction model based on lambda capture of a reducer cannot reproduce. We should consider switching over to it in Axom in the future. Here's some basic documentation on it. It will be expanded in the future, including more examples.... https://raja.readthedocs.io/en/develop/sphinx/user_guide/feature/reduction.html#experimental-reduction-interface
I might have mis-stated above but what I was proposing was a set of non-RAJA serial constructs and then RAJA for everything if it is enabled.
This is a good idea.
I just executed a similar exercise in Ascent mocking - a small subset of RAJA for use in serial when RAJA isn't around.
In case it helps, here are the RAJA-based and moc'ed functions:
It has some extra reductions that might come up.
Another idea would be to disable the shaping application in configurations without RAJA
I happened to have a non-RAJA serial Axom build I was using for debugging and ended up in it trying to run the quest_shaping_driver_ex utility to do some shaping in klee. This program gets built but it cannot run. The IntersectionShaper for example requires RAJA to do anything useful. I was going to change the CMake logic for quest_shaping_driver_ex to not build it unless RAJA is present. Then I recalled Kenny saying something to the effect of it should be possible to build Axom without RAJA. Most of the code reflects this (e.g. use of axom::for_all rather than RAJA). I took a look at IntersectionShaper and there are cases for using SEQ_EXEC as the execution policy but they are currently guarded with RAJA ifdefs because some RAJA reductions and atomics are used.
I made some exploratory edits to change calls like RAJA::ReduceSum to axom::ReduceSum and provided some non-RAJA definitions that produced a working quest_shaping_driver_ex program using the IntersectionShaper (albeit really slow). Is there any value to hardening such work and migrating some additional RAJA:: constructs to axom:: constructs? I expect these definitions could live down in axom/src/axom/core/execution/internal.
`
if !defined(AXOM_USE_RAJA)
template <typename ReducePolicy, typename T> class ReduceSum { public: ReduceSum(T value) : sum(std::make_shared(value))
{
}
void operator += (T value) const { T ptr = sum.get(); ptr += value; }
T get() const { return *sum; }
operator int() const { return static_cast(sum); }
operator double() const { return static_cast( sum); }
private:
std::shared_ptr sum;
};
template <typename AtomicPolicy, typename Precision> Precision atomicAdd(Precision acc, Precision value) { Precision ret = acc; *acc += value; return ret; }
else
template <typename ReducePolicy, typename T> using ReduceSum = RAJA::ReduceSum<ReducePolicy, T>;
template <typename AtomicPolicy, typename Precision> Precision atomicAdd(Precision *acc, Precision value) { return RAJA::atomicAdd(acc, value);
}
endif
`