JuliaDiff / ForwardDiff.jl

Forward Mode Automatic Differentiation for Julia
Other
888 stars 142 forks source link

Support for complex numbers #157

Closed AlexanderKoshkarov closed 4 years ago

AlexanderKoshkarov commented 7 years ago

It would be nice if the package would support the complex numbers. Probably it is difficult to check when the derivative exist, and what path to take on the complex plane to take derivative. However if you will make default derivative path (for example along real axis) it should be good enough (I think) for DifferentialEquations.jl where derivatives are used to calculate Jacobian in implicit ODE integrators. (@ChrisRackauckas is using derivative! and jacobian! in DifferentialEquations.jl) If it would be possible to indicate the curve on complex plane where to take derivative, it also would be awesome. Thanks.

julia> using ForwardDiff; import ForwardDiff

julia> f(x::Vector) = sum(sin, x) + prod(tan, x) * sum(sqrt, x);

julia> x = rand(5); xi = rand(Complex128,5);

julia> g = x -> ForwardDiff.gradient(f, x);

julia> g(x)
5-element Array{Float64,1}:
 1.48017
 3.33484
 1.36112
 1.55229
 1.376  

julia> g = x -> ForwardDiff.gradient(f, xi);

julia> g(xi)
ERROR: TypeError: Dual: in T, expected T<:Real, got Type{Complex{Float64}}
 in Type at /home/kosh/.julia/v0.5/ForwardDiff/src/cache.jl:52 [inlined]
 in construct_jacobian_caches(::Array{Complex{Float64},1}, ::ForwardDiff.Chunk{5}) at /home/kosh/.julia/v0.5/ForwardDiff/src/cache.jl:157
 in get!(::ForwardDiff.##23#24{Array{Complex{Float64},1},ForwardDiff.Chunk{5}}, ::Dict{Tuple{Int64,Int64,DataType,Bool},Any}, ::Tuple{Int64,Int64,DataType,Bool}) at ./dict.jl:663
 in multithread_jacobian_cachefetch!(::Array{Complex{Float64},1}, ::ForwardDiff.Chunk{5}, ::Bool, ::Bool) at /home/kosh/.julia/v0.5/ForwardDiff/src/cache.jl:65
 in jacobian_cachefetch! at /home/kosh/.julia/v0.5/ForwardDiff/src/cache.jl:74 [inlined]
 in compute_vector_mode_gradient(::#f, ::Array{Complex{Float64},1}, ::ForwardDiff.Chunk{5}, ::Bool) at /home/kosh/.julia/v0.5/ForwardDiff/src/gradient.jl:87
 in vector_mode_gradient at /home/kosh/.julia/v0.5/ForwardDiff/src/gradient.jl:94 [inlined]
 in #gradient#33(::Bool, ::Bool, ::Function, ::Function, ::Array{Complex{Float64},1}, ::ForwardDiff.Chunk{5}) at /home/kosh/.julia/v0.5/ForwardDiff/src/gradient.jl:23
 in gradient(::Function, ::Array{Complex{Float64},1}, ::ForwardDiff.Chunk{5}) at /home/kosh/.julia/v0.5/ForwardDiff/src/gradient.jl:22
 in (::##3#4)(::Array{Complex{Float64},1}) at ./REPL[7]:1
 in eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:64
 in macro expansion at ./REPL.jl:95 [inlined]
 in (::Base.REPL.##3#4{Base.REPL.REPLBackend})() at ./event.jl:68
jrevels commented 7 years ago

This is definitely possible using ForwardDiff's Dual, though I wonder what the API should look like.

ChrisRackauckas commented 7 years ago

What happens if you just have Dual subtype AbstractFloat? Is there a mathematical issue with that, do you need to expand Dual?

jrevels commented 7 years ago

What happens if you just have Dual subtype AbstractFloat? Is there a mathematical issue with that, do you need to expand Dual?

What's the point of doing that? There are many cases where subtyping to AbstractFloat would be wrong (e.g. Dual{N,Int} should be supported but shouldn't be <: AbstractFloat). ForwardDiff.Dual is subtyped to Real which should be enough for Complex support. In fact, ForwardDiff.Dual theoretically supports complex differentiation as it is, we just don't have API methods/tests for it.

gasagna commented 7 years ago

Are there any updates on plans to extend functionality to complex numbers?

AshtonSBradley commented 7 years ago

There seems to be a simple way forward here: complex differentiation in the sense that matters here is just real differentiation up to a unitary transformation.

I am primarily interested in applications like computing a Jacobian for a system of ODEs.

Typically when setting up differential equations we often aren't concerned so much about complex analysis and analytic functions, but rather a convenient complex variable shorthand for pairs of independent real variables.

So (for this case at least) this is done via: (1) do the unitary operation to map to real variables. (2) differentiate using real routines.

Or am I missing the point entirely?

ChrisRackauckas commented 7 years ago

So surely this is just (1) do the unitary operation to map to real variables. (2) differentiate using real routines.

Yeah, that's equivalent to differentiating along the (1+im) line direction. That would work fine. There really isn't much to this, other than someone needs to implement it.

MikaelSlevinsky commented 7 years ago

I think DualNumbers.jl has supported complex (first order) differentiation ever since https://github.com/JuliaDiff/DualNumbers.jl/pull/29

MikaelSlevinsky commented 7 years ago

There are probably edge cases like branch cuts that might not be precisely implemented, but time stepping near a branch cut seems risky for a system of ODEs anyway.

ChrisRackauckas commented 7 years ago

I think DualNumbers.jl has supported complex (first order) differentiation ever since JuliaDiff/DualNumbers.jl#29

But ForwardDiff doesn't use DualNumbers.jl, and instead uses its own internal Dual type, right? I never understood why, but it sounds like this problem would be solved then if ForwardDiff used DualNumbers.jl internally? What would that entail?

There are probably edge cases like branch cuts that might not be precisely implemented, but time stepping near a branch cut seems risky for a system of ODEs anyway.

Yes, I am pretty sure you'll have all sorts of other troubles in this case. The differentiation is the least of the worries here.

jrevels commented 7 years ago

But ForwardDiff doesn't use DualNumbers.jl, and instead uses its own internal Dual type, right? I never understood why, but it sounds like this problem would be solved then if ForwardDiff used DualNumbers.jl internally? What would that entail?

IMO, ForwardDiff.Dual is strictly superior for most practical purposes, and the plan is to replace DualNumbers.Dual with ForwardDiff.Dual once I have the time to do so; see https://github.com/JuliaDiff/DualNumbers.jl/issues/45. That issue also contains my argument that ForwardDiff.Dual <: Real is already better suited for complex differentiation than DualNumbers.Dual <: Number because it forces the tighter of the two orientations (i.e. forcing Complex{Dual} is better than forcing Dual{Complex}).

Note that the bulk of the work for this specific issue does not involve changing ForwardDiff.Dual, but rather working on the API methods.

antoine-levitt commented 7 years ago

I'm confused by the discussion here. For a map from complex to complex, differentiation does not map properly to nice structures, ie there is no "jacobian" of a map from C^n to C^n unless the map is holomorphic, which would probably limit the range of applicability. E.g. to solve an equation f(x) = 0, from C^n to C^n by Newton's method (which is presumably what ODE solvers need?), the only way is to setup a R^2n by R^2n system (unless I'm mistaken), which would presumably need a different API.

What would make much more sense is automatic differentiation for functions from C^n to R, where a gradient in C^n is well-defined (although one probably would like to use ReverseDiff for that). This is what's needed for optimization, and is conceptually easy (treat C as R^2). I currently need this very badly for one project, so if somebody could point me to the right direction for the implementation and if it's feasible for a beginner in julia, I'd be glad to help.

ChrisRackauckas commented 7 years ago

@antoine-levitt that's pretty much what @AshtonSBradley was saying. Sometimes complex numbers are used as a nice way to treat equations on R^2 and don't end with with analytic functions. In that case though, the derivative along the line (0,1+im) is the same as the derivative of the real + imaginary parts when done using dual numbers. Since the path doesn't matter when it's analytic, if we then just make sure we use dual numbers along that path then it will work for both the R^2 case and the true complex number analytic function case. That's precisely what we need for differential equations, so since it sounds like that works for optimization as well then it's likely the right way to go.

antoine-levitt commented 7 years ago

That seems tricky to me. There's two use cases here: functions from C to C (solving z^2 = 1, say), and from C to R (minimizing |z|^2 + 1). The API and implementation for both have to be different, because the first has to forbid non-analytic functions (which doesn't seem to be the case now in DualNumbers, e.g. https://github.com/JuliaDiff/DualNumbers.jl/pull/38), while the second requires it. My point is that the first use case is a bit artificial and limited in applicability (why allow solving z^2 = 1 and not z^2 = |z|(1+i) ?), but the second is well-defined, and should probably be what's meant when talking about a "complex derivative" in autodiff. The implementation of the second cannot use "complex dual numbers", which will be unable to differentiate things like real(conj(z)).

MikaelSlevinsky commented 7 years ago

I think the idea behind DualNumbers is that functions of dual numbers are defined by a limit (c.f. https://github.com/JuliaDiff/DualNumbers.jl/pull/29). Only when the function is (complex) differentiable does it return derivative information in the dual component. Note that DualNumbers might be phased out in favour of ForwardDiff's dual implementation.

ChrisRackauckas commented 7 years ago

My point is that the first use case is a bit artificial and limited in applicability

Except for the fact that it's not artificial and extensively used in scientific models like QM and circuits, and essentially most functions which people write down which are C -> C are complex analytic (otherwise the derivative isn't well defined anyways)? There's entire disciplines devoted to this, to the point that everyone is required to learn about C -> C functions. I think it's insane to write that off as artificial when it's easily the most common case for using a complex number.

but the second is well-defined, and should probably be what's meant when talking about a "complex derivative" in autodiff. The implementation of the second cannot use "complex dual numbers", which will be unable to differentiate things like real(conj(z)).

I think as with https://github.com/johnmyleswhite/Calculus.jl/issues/115, the second one can be split from C^n to R^2n and differentiate that, but a keyword argument can be used to do this mode. It would require double the function calls though, and most cases wouldn't need the slow path, though it would be necessary for some of the cases @AshtonSBradley has mentioned.

ChrisRackauckas commented 7 years ago

Thinking about this more, probably default to safe, but keyword arg to opt into performance (holomorphic defaults to false, set it to true). I think that makes every application (including @antoine-levitt 's and @AshtonSBradley) safe when the derivative is well-defined in R^2n, but lets users take the fast route if complex differentiable.

antoine-levitt commented 7 years ago

This discussion spans several threads so let's focus on it here. I'll try to gather my thoughts clearly in this post, sorry for the length!

For the sake of simplicity, let's discuss two prototypical applications: one, finding zeros of exp(cos(z)) = 1.2 with Newton, second, minimizing |z|^2 + Re(z) using a gradient method. If I understand correctly, the first is what you're interested in, the second is what I'm interested in. (let me just point out that this is far from being an academic example: a lot of problems in quantum mechanics are about minimizing C^n -> R functions). I don't know who would win a popularity contest between solving nonlinear systems and minimizing functions, but I'm guessing it would be close.

What we are interested in is finding out what, if any, sense there should be in defining derivatives of a function f with complex inputs. There are two sensible definitions here.

These two definitions do not match for a C^n to R function.

I'm not sure what other languages do, but e.g. Theano implements the second: http://deeplearning.net/software/theano/proposals/complex_gradient.html

The main issue I have with the first option is that not all functions are complex-differentiable. It's true that many functions are, but then again many are not. Moreover, even if the total function is complex-differentiable but has intermediary computations that are not, then the final result will simply be wrong. For instance, defining derivatives this way at each stage of an autodiff algorithm, I'm pretty sure that differentiating f(z) = real(z) + im*imag(z) will not complain but give wrong results. This is the worst kind of bugs, and I think it's a really, really bad idea.

So, if we want the first definition, that leaves us with only two consistent choices:

The implementation of the first choice is the simplest, but it's very restrictive, and in particular is useless for the second definition (because of Cauchy-Riemann, any complex-analytic C->R function is constant). The second seems more tricky (I'm not very familiar with autodiff), but is the more general. If that's done, then exposing the first and second definitions is just a matter of API: the code would return a double-size jacobian, from which one can select the relevant entries. It is also the correct way to implement real-to-real functions that happen to have intermediate complex quantities (e.g. t - > abs((1+im)+t(3-2im))). I think this is the way to proceed.

ChrisRackauckas commented 7 years ago

I'm not sure what other languages do, but e.g. Theano implements the second: http://deeplearning.net/software/theano/proposals/complex_gradient.html

That's not a good example to have as the only example. Of course the example which focuses on minimizing functions uses the choice which is good for your application (minimizing functions). But then if you point to something which is focused on mathematical modeling like Mathematica, you'll see it uses the first definition.

The implementation of the first choice is the simplest, but it's very restrictive, and in particular is useless for the second definition (because of Cauchy-Riemann, any complex-analytic C->R function is constant). The second seems more tricky (I'm not very familiar with autodiff), but is the more general. If that's done, then exposing the first and second definitions is just a matter of API: the code would return a double-size jacobian, from which one can select the relevant entries. It is also the correct way to implement real-to-real functions that happen to have intermediate complex quantities (e.g. t - > abs((1+im)+t(3-2im))). I think this is the way to proceed.

You pretty much just showed why that's not "the way to proceed": it's much more computationally expensive yet you only need it in these C->R cases, or with intermediate non-analytic functions. That makes it pretty clear that neither way is "correct", and both are necessary if we want to do things in a manner which is sane and efficient.

I think we need to focus on how to expose both definitions in a clear manner, and find out how to implement the two of them in a way that doesn't clash.

antoine-levitt commented 7 years ago

As I said, I only see two choices that make sense: undefine all non-analytic functions on complex dual numbers, and the "expensive" option (it should be at worst a factor of two slower, in the bandwidth-bound case). I believe the "expensive" option is the right one (it is the most general one), but since I'm not willing to implement it myself I will not fight against anyone who does the first one, as long as it does not wrongly differentiate non-analytic functions.

ChrisRackauckas commented 7 years ago

Then I think the first short-term goal would be to use the DualNumbers.jl implementation, but undefine all non-analytic functions on complex dual numbers so that way it's safe. In the meantime, defining both versions using finite differencing in Calculus.jl is not difficult and can be switched using a keyword argument.

For autodifferentiation though, we have to start thinking about how Cassette.jl will deal with this. My guess is that things can be tagged somehow by the user passing in a analytic=Val{true}, which can cause different dispatches for the derivative depending on whether it's the C derivative or R^2 derivative. This could work in an extension of dual numbers, but I'm not entirely sure how Cassette.jl works to know if this idea could be used there.

antoine-levitt commented 7 years ago

There might be a way to have the best of both worlds (a general approach that is as fast as the simple one for analytic functions), by having two types of dual numbers: AnalyticDual that has only df/dz, and GeneralDual that has df/dz and df/dz*. The second type would "propagate", in the sense that f(z) is an AnalyticDual if f is analytic and z an analytic dual, but every other case would give a general dual. All this is known at compile-time, so this could in theory work. No idea how hard it is to implement.

ChrisRackauckas commented 7 years ago

Cassette.jl could probably do that because it would build the whole computation graph, so it would be easy to just check if there's a function in the "not allowed" list.

jrevels commented 7 years ago

Note that Cassette won't provide an AD implementation, but the plan is to build a new AD implementations on top of it.

Cassette.jl could probably do that because it would build the whole computation graph, so it would be easy to just check if there's a function in the "not allowed" list.

For a forward-mode AD implementation, you don't want to build/store the graph. You could still do the check as part of normal call interception, however. Should be pretty simple to leverage dispatch to accomplish this in a way that the compiler can elide for non-Complex arguments.

Moreover, even if the total function is complex-differentiable but has intermediary computations that are not, then the final result will simply be wrong.

This is the why, IMO, if there is any degree of function-checking required for complex differentiation correctness, it shouldn't be exposed as a togglable option - it should be automatically enforced whenever ForwardDiff sees that's it's working with Complex arguments. If working with native language AD has taught me anything, it's that users mostly don't have the knowledge to make these kinds of decisions for themselves.

ChrisRackauckas commented 7 years ago

So instead of being toggle-able, have a set of allowed analytic functions and during call interception decide if 2n function calls are needed instead of n by checking some is_analytic trait on the functions or something like that? I'm not entirely sure how you plan on using Cassette for forward-mode autodiff so I am fuzzy on the details as to whether that can be done efficiently.

jrevels commented 7 years ago

So instead of being toggle-able, have a set of allowed analytic functions and during call interception decide if 2n function calls are needed instead of n by checking some is_analytic trait on the functions or something like that

Well, AFAICT, doing that dynamically would require some tricky runtime re-indexing that would probably be best to avoid.

I was thinking the AD tool would do the safe slow thing by default, and users can toggle a flag which gives the AD tool license to optimize if it can prove that all intermediate functions are analytic (by checking a trait like you mentioned). So I guess I am fine with a toggle, as long as the tool is able to throw loud, descriptive errors when the toggle is misused 😛

antoine-levitt commented 7 years ago

The more I think about it the more it seems like the right way to proceed would be to first get real to real with complex as intermediary computations working. This means treating complex numbers as simply structs with two reals. Probably that is not that hard to do in the current setup? Then, AD would compute the full derivative (2n x 2n if from C^n to C^n), and it's just a matter of API to select the right subarray to expose for the special cases of analytic C^n to C^n (as a bonus, you can check explicitly whether the Cauchy-Riemann equations hold and whether your function is analytic or not) and complex-to-real functions. That gets us a reasonable code that people can use (within a factor of 2 of optimal), and then we can refine the algorithm for special cases. Anything else (either specialized to the case of analytic C^n to C^n or to C^n to R functions) feels like premature optimization.

AshtonSBradley commented 6 years ago

@mcawte

ChrisRackauckas commented 4 years ago

Handled by https://github.com/JuliaLang/julia/pull/36030