Closed bytesnake closed 2 years ago
Playground for internal use / design tests, probably broken most of the time:
import simple_dep;
fn f(x: f32, y: f32) -> f32 {
x * 2 + y
}
#[enzymefn(f)]
#[enzymeIn(Active, Active)]
#[enzymeout(Active)]
fn df1(x: f32, y: f32) -> { f32, f32 } {
unreachable!()
}
#[enzyme(simple_dep::f, Active, Const)]
fn df2(x: f32, y: f32) -> { f32 } {
unreachable!()
}
Look at how #[test] passes information, We also have to verify that our df functions won't be inlined. (We have to use lto anyways, so that's not an issue).
looking how track_callee
is doing stuff, perhaps we can do it the following way:
the rosenbrock function
#[differentiate]
fn rosenbrock(a: f32, b: f32, x: f32, y: f32) -> f32 {
(a - x).powf(2.0) + b*(y-x*x).powf(2.0)
}
becomes
mod rosenbrock {
use super::DiffeType;
pub trait TypeInfo {
const RET: DiffeType = DiffeType::Const;
const A: DiffeType = DiffeType::Const;
const B: DiffeType = DiffeType::Const;
const X: DiffeType = DiffeType::Const;
const Y: DiffeType = DiffeType::Const;
}
pub struct Forward;
impl TypeInfo for Forward {}
pub fn generic_body<T: TypeInfo>(a: f32, b: f32, x: f32, y: f32, ty: T) -> f32 {
(a - x).powf(2.0) + b*(y-x*x).powf(2.0)
}
}
fn rosenbrock(a: f32, b: f32, x: f32, y: f32) -> f32 {
rosenbrock::generic_body(a, b, x, y, rosenbrock::Forward)
}
derive_diff!(rosenbrock, rosenbrock_grad, a: const, b: const, x: dup, y: dup);
would become
pub struct RosenbrockGrad {}
impl rosenbrock::TypeInfo for RosenbrockGrad {
const RET: DiffeType = DiffeType::Const;
const A: DiffeType = DiffeType::Const;
const B: DiffeType = DiffeType::Const;
const X: DiffeType = DiffeType::Dup;
const Y: DiffeType = DiffeType::Dup;
}
fn rosenbrock_grad(a: f32, b: f32, x: f32, _x_dup: &mut f32, y: f32, _y_dup: &mut f32) {
rosenbrock::generic_body(a, b, x, y, RosenbrockGrad)
}
TypeInfo
several copies of the function in question are generated with different diffe type information. Those can then be caught after MIR in the LLVM codegen to replace them with their differentiated counterparts. We are basically re-using the type system for our cause.it may also make sense to automatically generate structures containing all output parameters
we could also define a simpler call macro, only allowing constant and output parameters, which directly infers const and output parameters
let (x,y) = (1.0, 1.0);
let (ret, xd, yd) = call_diff!(rosenbrock, 0.0, 1.0, x, y);
and is similar to derive_diff
but inlines the additional struct definition. It also never assumes that we need the derivative w.r.t the returned value.
the names differentiate
, derive_diff
and call_diff
are highly subjective and should be more expressive
One thing I'll add to your plate to think about for UX/syntactic sugar.
We're shortly finishing up "forward mode" AD in addition to the existing "reverse mode AD". Reverse mode computes the derivatives of all inputs with respect to a single output (or more specifically can do with respect to a linear combination of outputs). Forward mode does all outputs with respect to a single input.
Forward mode is called in much the same way, except all inputs are duplicated (e.g. an f32 would be duplicated not active).
We're still thinking through what good syntax should exist for it for high level languages (have a tentative __enzyme_fwddiff
, but wanted to make sure you were aware of for future naming conventions/collisions.
I've been reconsidering what we need for an interface. For simplicity, I assumed that we are allowed to implement a macro with extra rights (similar to concat!
). We require that the macro can look up a function header. Type handling is usually done after macro expansion. However, we could even accept the header as a string, since that is enough to calculate how the header of our generated function will look like.
This also aims to be exhaustive. It's hard to claim that an interface is future-proof. At least we have an advantage here since the underlying theory based on the chain-rule is unlikely to change. I'm also making use of #[non_exhaustive] enums.
That is hopefully sufficient to be prepared for other AD tools handling other codegen backends.
We still need to specify the internal representation and the interface on THIR / MIR level w.r.t. the different cg_backend.
But that's probably less critical, as it is internal (although we obviously still have to take it serious).
differentiate!(primary_fnc : FncPointer, gradientName : str,
inputActivity : Activities, outputActivity : Activities);
// Will expand to
// fn gradientName (...) { unreachable!() }
// and parse it's input into some rustc-metadata.
#[non_exhaustive]
enum Activities {
AllFloats,
PerEntry(Vec<Activity>) // one per input / output parameter
}
enum Activity {
Active, // calculate the primary and the gradient
Gradient, // calculate the gradient but not the primary
Constant, // calculate the primary but not the gradient
}
#[non_exhaustive]
enum Mode {
Forward,
Reverse,
// ReverseSplit
// Mixed(more-details)
}
A minimal codegen implementation should then support ForwardMode and expects all (non-int) inputs to be active. Unsupported Types (like currently globals or dyn trait for Enzyme) should then lead to a panic if there is no good falll-back.
There are a few open questions.
1) Can we remove the gradientname and default to d
Libraries might then provide more convenient wrappers like forward!(fnc)
or reverse!(fnc)
which might generate
d_
Looking at our earlier comments:
precise control of what are constant, primary and adjoint variables Not only activity, but also of modes.
make it easy to mark a function as a candidate for differentiation We skipped this requirement by requesting a macro with extra capabilities. This simplifies user code. Also, we can't mark primary functions if there are defined in dependencies. So that's a plus.
inline derivatives into existing structs or calls directly in code You might need to help me with the first part, but I feel like the parameter handling of this interface is quite consistent. The second part can most likely be handled by a user-level macro which wraps the
reverse!(foo)
call with some brackets and directly calls / returns the newly declared function: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=803af8e8a3fdcde2eda904ad67b5d4d2 That might even be do-able by a decl macro / macro2.0 instead of a proc-macro and it's probably doable in a hygenic way. But I don't know that much about macro-hygiene, so it's probably better if I leave that to others. Also,unimplemented!()
won't lead to a build issue and won't create runtime-panics, since the implementation will be replaced.
@wsmoses I wrote this reply two weeks ago via Email but apparently it never came through :sweat_smile:
Reverse mode computes the derivatives of all inputs with respect to a single output [..]
those are mutual exclusive and running in forward or backward accumulation? I'm not sure about application for "forward mode" AD in situ or even for Jacobi matrices
Forward mode is called in much the same way, except all inputs are duplicated [..]
can you explain this a bit more, I thought that you only have a single input in forward mode
We're still thinking through what good syntax should exist for it for high level languages
From an UX standpoint they can be put at the end of a parameter list, though this magic will probably confuse users a bit too much, cause there is no clear association anymore. We also have to bail out for MIMO functions. Don't have much free-time at the moment but will take this into consideration, thanks!
For simplicity, I assumed that we are allowed to implement a macro with extra rights (similar to
concat!
). We require that the macro can look up a function header. Type handling is usually done after macro expansion. However, we could even accept the header as a string, since that is enough to calculate how the header of our generated function will look like.
best we can catch unaligned arguments as early as possible and that should happen during macro expansion, so there is a bit of magic involved
This also aims to be exhaustive. It's hard to claim that an interface is future-proof. At least we have an advantage here since the underlying theory based on the chain-rule is unlikely to change. I'm also making use of #[non_exhaustive] enums. That is hopefully sufficient to be prepared for other AD tools handling other codegen backends.
but matching against enums will only happen for codegen's and for user-facing part it's a breaking change anyways
- That will collide if we differentiate the same function in the same module multiple times (with different arguments). That's probably quite rare, so we can ask users to work around it using wrappers or modules, that's easy. Libraries might even be able to do that.
but this is the kind of magic, which should be avoided and will decrease chance to get accepted into rustc
.. will revise that tomorrow ..
those are mutual exclusive and running in forward or backward accumulation? I'm not sure about application for "forward mode" AD in situ or even for Jacobi matrices
At minimum this can be useful for controlling/reducing allocation. E.g. you can store the derivative result in an existing location.
Forward mode is called in much the same way, except all inputs are duplicated [..]
can you explain this a bit more, I thought that you only have a single input in forward mode
Yeah: Suppose you have a multi-input, multi-output function out[:] = f(in[:]), where the dimension of in is I and the dimension of out is O.
There are I * O potential derivatives one might want to individually compute (e.g. the derivative of every output with respect to every input): J[i, j] = dout_j/din_i
.
Reverse mode can get you J[:, j]
for any individual j
(e.g. the derivative of any input with respect to a given output) in a single call, and forward mode can get you J[i, :]
for any individual i
(e.g. the derivative of a given input with respect to all outputs).
Enzyme for both modes, actually implements a more general (and thus more useful) version of both of these. Specifically, reverse mode computes the adjoints. That is to say given any vector v
, it computes \sum_j v[j] J[i, j] = \sum_j v[j] dout_j/din_i
. In other words it can compute the sum of gradients with respect to a vector of outputs. If v is set to 1 at one index and 0 elsewhere this gets you the "traditional" gradient wrt one output.
We have a similar implementation for forward-mode in which say we're given a vector u
, it computes \sum_u u[j] J[i, j] = \sum_i u[i] dout_j/din_i
. In other words it can compute the sum of derivatives with respect to a vector of inputs. Again, if u is set to 1 at one index and 0 elsewhere this gets you the "traditional" derivative. This can also be thought of as the directional derivative.
I've implemented a differentiate attribute proc-macro here: https://github.com/ZuseZ4/autodiff At some point we should probably discuss the naming of some parameters, but that's easy to update. Using it revealed that one of of my llvm wrappers here doesn't cover all cases, once I fixed that I'll use that macro for this repo, as it's much more convenient.
I just merged the new macro, I guess now it's time for more testing / documenting, to see if we want to have another iteration on the user-frontend.
The frontend seems to work fine, despite needing some smaller updates for fwd-mode (-vector). I guess atm. there is no reason to have larger discussions on that, so closing here.
Enzyme's integration into Rust should give the user
There several ways how to structure macros.
or
or