Closed cortner closed 2 years ago
I'm now guessing this is exactly a place where matrices would have come in handy. No need to worry about how Zygote handles a Vector{SVector}
For the val
operation I am thinking of two options:
typeof
function to enforce the type we return inside an rrule.adjoint()
and create multiple instances of it that dispatch according to the dp they recieve. For the vector{svector} issue is there a way I can reproduce the problem. Is it still persistent in the current commit, and if so is it a problem with adjoint_eval_D? or do you mean the second derivative according to parameters?
I'm now guessing this is exactly a place where matrices would have come in handy. No need to worry about how Zygote handles a
Vector{SVector}
Here you mean the Vector{SVector} issue mentioned above?
A little update - this turned out to be easier and clearer than I expected. All tests are again passing and here are the performance results: I would say that's within a factor 2 of what we can realistically achieve within our group.
gits/ACE.jl/profile ∙ j17 --project=.. profile_linearmodel.jl 1 ↵ ᚴ 𝘮 11 ⇄ 340 multisvec 1M
length(basis) = 2566
[ Info: Time evaluate incl allocation
58.000 μs (0 allocations: 0 bytes)
[ Info: Time grad_config with and without allocation
[ Info: This looks like a factor 4.5 of evaluate, so probably more we can do
257.334 μs (6 allocations: 266.61 KiB)
257.041 μs (5 allocations: 266.31 KiB)
[ Info: grad_config vs rrule_evaluate
g1 ≈ g2 = true
260.916 μs (6 allocations: 266.61 KiB)
257.458 μs (6 allocations: 266.61 KiB)
[ Info: Time grad_params with and without allocation
[ Info: a little surprising we dont get closer to factor 1?
140.291 μs (2 allocations: 20.11 KiB)
138.916 μs (0 allocations: 0 bytes)
[ Info: adjoint_EVAL_D - single property
[ Info: This may be the single-most important function, and it looks fantastic!
302.959 μs (8 allocations: 459.83 KiB)
[ Info: Multi-property
[ Info: - evaluate
59.792 μs (0 allocations: 0 bytes)
[ Info: - grad_params
193.667 μs (2568 allocations: 280.70 KiB)
[ Info: - grad_config
338.125 μs (6 allocations: 275.31 KiB)
[ Info: - _rrule_evaluate
275.708 μs (5 allocations: 266.59 KiB)
[ Info: - adjoint_EVAL_D
363.791 μs (9 allocations: 661.69 KiB)
I'm still not 100% committed to this path and would still consider going the matrix path. But for now my suggestion is to merge this, update ACEflux. and try your tests again to see whether you can now run the regression more efficiently.
@andresrossb If you are happy with this approach, then I'll merge the current branch into you PR. We then update the benchmark tests, confirm that they run and then can merge that too and tag a new version.
I'm ok with this path as well, it looks like it's performing way better than before. I don't expect to run into any problems updating ACEflux and trying out the models again.
Let's merge it!
thank you, thank I'll merge and we go from there.
Regarding your comments above:
I'm now guessing this is exactly a place where matrices would have come in handy. No need to worry about how Zygote handles a
Vector{SVector}
Here you mean the Vector{SVector} issue mentioned above?
yes. I was able to resolve it without too much difficulty though. It was a bug at my end, but highlights again how matrices is conceptually much simpler.
For the
val
operation I am thinking of two options:
- using something like a
typeof
function to enforce the type we return inside an rrule.
I agree - this is still missing. I had an idea about that, but it turned out too complex to carry out quickly. We can discuss.
- making a function inside an rrule like we did in ACEflux
adjoint()
and create multiple instances of it that dispatch according to the dp they recieve.
actually I think this is the idea I mention above. I think there is a lot of flexibility in this and we can explore it. We may have to as we move to other properties.
For the vector{svector} issue is there a way I can reproduce the problem. Is it still persistent in the current commit, and if so is it a problem with adjoint_eval_D? or do you mean the second derivative according to parameters?
no need anymore, bug is fixed.
This is the alternative approach to moving everything to matrices. So far I actually quite like it. It does make things less transparent, but not as badly as I had feared, and definitely gives more flexibility.
There are two problems left to solve:
val
operation doesn't have correct adjoints. Somehow we need to figure out how to produce adjoints that enforce that gradients have the right "units". E.g. a gradient of an invariants w.r.t. params should be invariant. a gradient of a scalar w.r.t. params should be scalar again.Vector{Vector}
instead ofVector{SVector}
. I vaguely remember there is a Flux issue about this. Can we solve it? Ignore it? I'm worried it could cause a severe performance degradation. To be discussed.CC @andresrossb