EnzymeAD / Enzyme

High-performance automatic differentiation of LLVM and MLIR.
https://enzyme.mit.edu
Other
1.3k stars 110 forks source link

incorrect derivative of function that returns struct #1894

Open samuelpmish opened 6 months ago

samuelpmish commented 6 months ago

https://fwd.gymni.ch/vKIfPx

It seems like some versions of clang are getting the right answers, but others aren't

samuelpmish commented 6 months ago

Update: I was asked to try refactoring the wrapper to return by-reference instead of by-value, but that doesn't seem to have changed the output:

https://fwd.gymni.ch/qIWjto

It's also worth noting that I may just be using __enzyme_fwddiff incorrectly!

samuelpmish commented 6 months ago

I showed this to @jandrej and he noticed an important detail: the results are correct when compiling with -O2, but incorrect without it, so maybe it's another inlining issue (and if so, which functions should be forced to inline?)

samuelpmish commented 1 week ago

update: here's a compiler explorer link with a smaller reproducer implementation https://fwd.gymni.ch/AJEV6C

wsmoses commented 1 week ago

https://fwd.gymni.ch/ysVgdC

wsmoses commented 1 week ago

x/ref https://enzymead.github.io/Enzyme.jl/dev/faq/#Runtime-Activity

wsmoses commented 1 week ago

relevant fix: https://fwd.gymni.ch/r4VfFq

samuelpmish commented 1 week ago

Thanks for the help @wsmoses!

is it technically feasible to emit a warning/error message when this happens? In this case, compiling with optimization seems to produce the right results-- I wonder does that imply that the activity instability arises from the function pointer indirection / wrappers (which requires inlining/optimization to "see through" the indirection)?

In the meantime, we'll just append that flag and see how it goes!


I'll also ping @SiebenCorgie as I believe they observed a similar-looking issue a few weeks ago (POD data types, simple calculation, diff between derivative in debug/release) in rust. I'll duplicate their report here for completeness

Hi, I'm currently playing around with the rust-enzyme integration. I'm trying to build the derivative of a simple signed distance function. I found that I get a different derivative values for certain coordinates depending on the compilation profile.

I came up with this minimal example. For x=[2,2,2]:

dev profile [unoptimized + debuginfo]: x=[2, 2, 2], dy=[0, 0.33333334, 0.33333334]

release profile [optimized]: x=[2, 2, 2], dy=[0.33333334, 0.6666667, 0.6666667]

I verified that the release build outputs the correct value, so the debug build seems to be broken. This also seems to be somehow related to signedness. All values I tested in the positive region are broken. But for instance x=[-2, -2, -2] outputs the correct value for both builds.

To set up the Rust-Enzyme toolchain I followed the installation guide here: https://enzyme.mit.edu/index.fcgi/rust/installation.html

Am I doing something wrong, or is my project setup missing anything? Are you able to reproduce the bug? I'd appreciate any pointers as to how to fix the problem.

https://github.com/SiebenCorgie/enzyme-rust-sdf-minimal/blob/main/src/main.rs

Is there a similar way to enable runtime activity in Rust to see if that resolves the issue above? (or perhaps Sieben has already found resolution)