Open willtebbutt opened 4 years ago
The Swift folk used to make a distrinction, and then swapped away from it to treating them the same.
They are pretty smart about this, so I think we can follow their lead. It would be good if someone could find their notes on that
Good enough for me. I'm going to close this and move on with life.
Actually, maybe not. Looking at apple/swift#24825 many of their reasons don't apply to us. Because julia doesn't have a static type system. So perhaps this bears some more thought
Looking at the rules in ChainRules
, we do use the tangents and cotangents differently. For the pushforwards, we always (I think) maintain the right type constraints and such so that the output of a function is tangent to the output of the primal. However, specifically for functions of AbstractArray
s, we don't. In fact, the pullbacks are technically pulling back over a different primal function than the one we actually evaluated.
Many functions of abstract arrays in Base can generally be described as an embedding into the Array
s, followed by an operation, followed by another embedding into some other space. In the forward pass, those embeddings are implicitly used to project to the tangent at the primal values. The same should happen in reverse to project to the cotangent spaces. However, the general rrule
s implemented for AbstractArray
s leave out the embeddings, instead behaving as though they were already passed something embedded into the Array
s. Consequently, the pullback does not project its output to the cotangent space of the input to the primal function. Since operations on differentials are linear, we can usually get away with this (that is, some pullbacks will project, others won't), so long as the constructor of the AbstractArray
subtype eventually projects. (i.e., accumulate-then-project instead of project-then-accumulate). That inversion of the operation order basically stops working the instant a function of the abstract array effectively ignores its embedding into the arrays. Then the change in order matters. (e.g. calling .data
in Symmetric
, as discussed in https://github.com/JuliaDiff/ChainRules.jl/issues/191).
Sorry for the slight (co)tangent.
We don't currently maintain a distinction between tangents and cotangents in ChainRules. I've yet to come up with an example where it matters whether or not you do, but it would be good to understand this sooner rather than later.