Open jedbrown opened 1 month ago
Is this something Enzyme core should understand, or what would be the fix here?
Yeah I think we should be able to just mark Enzyme internally as understanding this as equivalent to MPI double everywhere.
Is there a spec sheet for the ABI rust uses for MPI somewhere (probably @jedbrown)?
Ah reading Jed's original comment slightly more closely, I suppose the defns are all in https://github.com/rsmpi/rsmpi/blob/1cee1f9d8ad06b8085b98463b82a25494d98a513/mpi-sys/src/rsmpi.c#L3.
Yeah I think the resolution would be finding all uses of say "ompi_mpi_double" in the Enzyme code base and also adding RSMPI_DOUBLE and the like.
This particular issue will be resolved by adding it to https://github.com/EnzymeAD/Enzyme/blob/2f34f0b0d740980a0997a4871d8a8495cfed37c3/enzyme/Enzyme/ActivityAnalysis.cpp#L112 [or alternatively we could also give rust a similar attribute for inactive globals like in C++, but if this is the standard way of MPI in rust, we might as well handle this internally in Enzyme proper]
Also helpful to add to: https://github.com/EnzymeAD/Enzyme/blob/2f34f0b0d740980a0997a4871d8a8495cfed37c3/enzyme/Enzyme/TypeAnalysis/TypeAnalysis.cpp#L4783 https://github.com/EnzymeAD/Enzyme/blob/2f34f0b0d740980a0997a4871d8a8495cfed37c3/enzyme/Enzyme/AdjointGenerator.h#L148 https://github.com/EnzymeAD/Enzyme/blob/2f34f0b0d740980a0997a4871d8a8495cfed37c3/enzyme/Enzyme/Utils.cpp#L2346 https://github.com/EnzymeAD/Enzyme/blob/2f34f0b0d740980a0997a4871d8a8495cfed37c3/enzyme/Enzyme/MLIR/Analysis/ActivityAnalysis.cpp#L36
The symbols like RSMPI_DOUBLE
exist because MPI_DOUBLE
is a macro so we can't otherwise use it from Rust. With a user-defined operation, it would not map to a static value. I suppose Enzyme may not support user-defined MPI_Op
s, but if that's desired in the roadmap, it would be necessary to intercept/augment MPI_Op_create
and keep a run-time table to the derivative of the user's operation. When I pass the op in as Const
to the differentiated function, I'm still getting the error pointing back to RSMPI_DOUBLE
, which I don't really understand except for inlining.
The double issue is that Enzyme is being conservative and thinking the global variable itself is active, somehwere. Rather than risk wrong answers, throwing an error.
You can tell Enzyme to assume all globals are inactive: https://enzyme.mit.edu/getting_started/UsingEnzyme/#assume-inactivity-of-unmarked-globals but we should just mark this in the places listed above.
bug2.ll.txt Unfortunately the reduction operator is passed in as pointer. Outside of the function being differentiated, this pointer got created by
%210 = load ptr, ptr @RSMPI_SUM, align 8, !noundef !801
%236 = invoke noundef double @_ZN10dot_enzyme12dot_parallel17h7dfcd86d9e8c176bE(ptr noalias noundef nonnull readonly align 8 dereferenceable(16) %25, ptr noalias noundef nonnull readonly align 8 %218, i64 noundef %219, ptr noalias noundef nonnull readonly align 8 %227, i64 noundef %228, ptr noundef %210)
The issue here is that we can't always see this when differentiating Enzyme, proof:
307207 declare double @__enzyme_autodiff(...)
1
2 define double @enzyme_opt_helper_0(ptr %0, ptr %1, i64 %2, ptr %3, i64 %4, ptr %5) {
3 %7 = call double (...) @__enzyme_autodiff(ptr @_ZN10dot_enzyme12dot_parallel17h7dfcd86d9e8c176bE, metadata !"enzyme_const", ptr %0, metadata !"enzyme_dup", ptr %1, ptr %1, metadata !"enzyme_const", i64 %2, metadata !"enzyme_dup", ptr %3, ptr %3, metadata !"enzyme_const", i64 %4, metadata !"enzyme_const", ptr %5)
4 ret double %7
5 }
; Function Attrs: noinline nonlazybind sanitize_hwaddress uwtable
8117 define internal noundef "enzyme_type"="{[-1]:Float@double}" double @_ZN10dot_enzyme12dot_parallel17h7dfcd86d9e8c176bE(ptr noalias nocapture noundef readonly align 8 dere ferenceable(16) "enzyme_type"="{[-1]:Pointer}" %0, ptr noalias nocapture noundef nonnull readonly align 8 "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %1, i64 no undef "enzyme_type"="{[-1]:Integer}" %2, ptr noalias nocapture noundef nonnull readonly align 8 "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %3, i64 noundef "enz yme_type"="{[-1]:Integer}" %4, ptr noundef "enzyme_type"="{[0]:Pointer}" %5) unnamed_addr #0 personality ptr @rust_eh_personality {
;; some lines
%23 = call noundef i32 @MPI_Allreduce(ptr noundef nonnull %8, ptr noundef nonnull %7, i32 noundef 1, ptr noundef %10, ptr noundef %5, ptr noundef %22), !noalias !990
;;;
which gives the following error
error: <unknown>:0:0: in function preprocess__ZN10dot_enzyme12dot_parallel17h7dfcd86d9e8c176bE double (ptr, ptr, i64, ptr, i64, ptr): Enzyme: call: %23 = call noundef i32 @MPI_Allreduce(ptr noundef nonnull %8, ptr noundef nonnull %7, i32 noundef 1, ptr noundef %10, ptr noundef %5, ptr noundef %22) #38, !noalias !65
unhandled mpi_allreduce op: ptr %5
Not sure how we can fix that in general? However, if we ignore my artificial autodiff testcase and focus on the real world one where we read from MPI_SUM, that reminds me of reading from an enzyme_const global, which enzyme already recognizes, so it should be doable to copy that in case that a load from a recognized ptr name dominates the call sites?
You may be able to do something for MPI reductions which is similar to how we handle generic indirect function calls. Specifically we define the shadow of a function to be the corresponding data [e.g. a pointer to augmented forward and reverse functions], and then if it is an indirect function use the shadow for the relevant derivative code.
This also would enable generic user defined ops which would be nice.
Alternatively you could look at ways to inline/pass the info from caller to callee like we do already for aliasing/etc
The first one is too far in the Enzyme core weeds, I have no clue how to write that code, so I'll leave it to someone else.
If you have a link for the last option on where we do that already I might be able to copy that over for this case:
Alternatively you could look at ways to inline/pass the info from caller to callee like we do already for aliasing/etc
That should be good enough for this specific case.
This is based on this rsmpi example. After checking out that branch, please run
Note that
RSMPI_DOUBLE
is defined inmpi-sys/src/rsmpi.c
asso that the value can be used by Rust FFI bindings.
Debug profile
I have a different error with the debug profile.
Meta
rustc --version --verbose
: