FluxML / Zygote.jl

21st century AD
https://fluxml.ai/Zygote.jl/
Other
1.48k stars 211 forks source link

getindex performance #365

Open willtebbutt opened 5 years ago

willtebbutt commented 5 years ago

Backproping through getindex is pretty slow at the minute, due to the need to make copies of the entire array on the reverse-pass, and to fill it with zeros. The current implementation basically has two steps

  1. Fill array with zeros and place non-zero entries where the original array was accessed.
  2. Create new array by calling accum on the thing constructed in 1 and whatever the reverse-mode sensitivity already is.

Point 1 can be dealt with through the careful use of structurally-sparse arrays (i.e. a new AbstractArray type that is zero everywhere except for certain blocks or whatever). Point 2 is hard to deal handle without in-place accumulation.

@MikeInnes have you thought about doing in-place accumulation at all? How do you feel about it? Is it a thing that you might be convinced to enable somehow?

edit: also, please feel free to correct the above if I've got my analysis of the bottleneck wrong

Drvi commented 5 years ago

Fwiw, I tried to fix this for Tracker: https://github.com/FluxML/Flux.jl/pull/589/files, never actually finished it...

MikeInnes commented 5 years ago

Yes, spot on with the analysis. Some kind of dynamic analysis of sparsity might be the way to do it, though that also makes things like future type inferability difficult even in principle.

In-place accumulation was enabled on #61, where it's a natural fit. It's possible that doing just accumulation without actually supporting mutation generally might be significantly simpler, though I suspect it may not be. Will have to consider it some more.

Doing things in-place is a bit problematic with Zygote's semantics, since the gradient of an Int array can be a Float etc. Automatically converting int->float is easy but then complex numbers are an issue, etc. We could do fancier tricks here to make that fast in the common case, but it's increasingly inelegant.

The other option is to just recommend Zygote.Buffer here, since that does already have its accumulation done in-place.

willtebbutt commented 5 years ago

Okay, thanks for the info. My particular problem can be solved with a custom adjoint, so I'll do that for now. Will leave this open in case anyone fancies tackling it in the general case.

sethaxen commented 4 years ago

getindex is also not fully compatible with complex sensitivities. See #376.