JuliaDiff / ChainRulesCore.jl

AD-backend agnostic system defining custom forward and reverse mode rules. This is the light weight core to allow you to define rules for your functions in your packages, without depending on any particular AD system.
Other
258 stars 62 forks source link

Tangents vs Cotangents #160

Open willtebbutt opened 4 years ago

willtebbutt commented 4 years ago

We don't currently maintain a distinction between tangents and cotangents in ChainRules. I've yet to come up with an example where it matters whether or not you do, but it would be good to understand this sooner rather than later.

oxinabox commented 4 years ago

The Swift folk used to make a distrinction, and then swapped away from it to treating them the same.

They are pretty smart about this, so I think we can follow their lead. It would be good if someone could find their notes on that

nickrobinson251 commented 4 years ago

https://github.com/apple/swift/pull/24825 (and docs on their types https://github.com/tensorflow/swift/blob/master/docs/DifferentiableTypes.md)

willtebbutt commented 4 years ago

Good enough for me. I'm going to close this and move on with life.

oxinabox commented 4 years ago

Actually, maybe not. Looking at apple/swift#24825 many of their reasons don't apply to us. Because julia doesn't have a static type system. So perhaps this bears some more thought

sethaxen commented 4 years ago

Looking at the rules in ChainRules, we do use the tangents and cotangents differently. For the pushforwards, we always (I think) maintain the right type constraints and such so that the output of a function is tangent to the output of the primal. However, specifically for functions of AbstractArrays, we don't. In fact, the pullbacks are technically pulling back over a different primal function than the one we actually evaluated.

Many functions of abstract arrays in Base can generally be described as an embedding into the Arrays, followed by an operation, followed by another embedding into some other space. In the forward pass, those embeddings are implicitly used to project to the tangent at the primal values. The same should happen in reverse to project to the cotangent spaces. However, the general rrules implemented for AbstractArrays leave out the embeddings, instead behaving as though they were already passed something embedded into the Arrays. Consequently, the pullback does not project its output to the cotangent space of the input to the primal function. Since operations on differentials are linear, we can usually get away with this (that is, some pullbacks will project, others won't), so long as the constructor of the AbstractArray subtype eventually projects. (i.e., accumulate-then-project instead of project-then-accumulate). That inversion of the operation order basically stops working the instant a function of the abstract array effectively ignores its embedding into the arrays. Then the change in order matters. (e.g. calling .data in Symmetric, as discussed in https://github.com/JuliaDiff/ChainRules.jl/issues/191).

Sorry for the slight (co)tangent.