JuliaDiff / ChainRulesCore.jl

AD-backend agnostic system defining custom forward and reverse mode rules. This is the light weight core to allow you to define rules for your functions in your packages, without depending on any particular AD system.
Other
251 stars 61 forks source link

Special case derivative of non-holomorphic functions of type ℂ(^n)→ℝ #23

Open simeonschaub opened 5 years ago

simeonschaub commented 5 years ago

A common use case for non-holomorphic functions are norms and similar functions, that project a complex vector space onto the reals. These are also probably the most interesting for optimization problems. According to the Cauchy-Riemann equations, any such functions that are non-trivial have to be non-holomorphic, so we currently only have Wirtinger derivatives to describe these correctly. In these cases, storing the full Wirtinger primal and conjugate is unnecessary, since by looking at the definitions, it is easy to see, that they each must be the complex conjugate of each other, so one would only need to store one of them. If used in arrays for example, this would save half the memory size otherwise required. My proposal would be to introduce a singleton type, lets call it ConjugateOfWirtingerConjugatefor lack of a better name. This would be passed as primal to Wirtinger or WirtingerRule and signify that the Wirtinger primal is just the conjugate of the Wirtinger conjugate. This would need to be special cased in a couple of cases: For example if chained with a real derivative, ConjugateOfWirtingerConjugate can be preserved, though otherwise, this would need to fall back to a full Wirtinger derivative. If there's consensus that something like this would be useful, I could prepare a PR. An alternative would be a special AbstractRule for these cases, which might make some things a bit cleaner, but it might be confusing to have two different types for derivatives of non-holomorphic functions. @jrevels seems to be the main instance on everything Wirtinger, what are your thoughts on this?

ssfrr commented 5 years ago

At some point I did some whiteboarding about this and talked to @jrevels about it - there are 4 categories of functions that seem useful to consider:

  1. Holomorphic: df/dz* == 0
  2. Antiholomorphic: df/dz == 0
  3. Complex -> Real: df/dz == (df/dz)
  4. General nonholomorphic (df/dz and df/dz* independent)

You can also look at compositions of these 4 cases. In the grid here, for a composition of functions h(z) = g(f(z)), the row is the category of g, the column is the category of f, and the letter gives you the category of h.

wirtinger compositions

I used to have a partial implementation that handled all 4 cases as separate types, but then @jrevels helped me realize that cases 1 and 2 can be handled within the case-4 framework with a Zero type.

I'm not actually sure if this is useful for answering the question and I've unfortunately sort of lost track of where ChainRules is at with implementing these things, but maybe it's useful in thinking about how they compose.

simeonschaub commented 5 years ago

Thanks for this, @ssfrr! For case 1, we actually don't use Wirtinger at all and just return a number, which fits with the mathematical notion, that these follow the Cauchy-Riemann equations and are therefore complex differentiable. Case 2 just corresponds to Wirtinger(Zero(), ∂f_∂z̅), like you described, but this case isn't as important as the others, anyways. Right now, we use plain Wirtinger for both, cases 3 and 4. What I'm proposing is introducing another singleton type for case 3, so we can take advantage of the nice property that ∂f_∂z̅ == conj(∂f_∂z) for all of these. The composition of all of them already happens very naturally, except that we just handle case 3 the same as case 4, but with my proposal, this would also be taken care of. If you have some specific use cases in mind for all of this, I think we'd all be very interested to hear about them.