Closed cortner closed 2 years ago
.... related - I just got rid of adjoint_EVAL_D1
. My tests so far show that we can also get rid of grad_config
and just call _rrule_evaluate
with w = 1
. Now the only thing left is to remove the multi-property version of adjoint_EVAL_D
.
We went with matrices since it was compatible with Flux and at the time there wasn't a clear advantage on using svectors. I don't think we would have a problem switching, at least not in principle. We could always add another svector2matrix if needed.
I think I will at least try. I never liked the fact that we had 3 implementations of adjoint_EVAL_D
, 3 implementation of genmul
, and and 2 implementations of _rrule_evaluate
. I'll start it in a separate branch maybe and see how far I get.
E.g. I would now have to optimize a second code-path for multiple properties. (the one for a single property is now fairly fast...)
As I'm looking at the codes, I now wonder whether this is overcomplicating a simple problem. I'm really not sure yet ... I think our two options are
(1) As we do it now - n properties means parameters are Vector{SVector}
(2) There are always n properties, but n could be 1, and we use Matrix
for parameters as well as gradients.
Option (1) seems much more flexible, but makes like very complicated. Lots of special contract
functions must be implemented, and pre-allocation becomes hellish. At the moment it even looks as if we would beed to implement a new -
Vector-of-Propertiestype and then re-implement all functionality we need for it, which will be equally tedious as the
States` and properties interface.
Option (2) - I actually cannot see a reason why this would ever be a bad idea. If one has a single property one can still index into matrix as if it were a vector. And if we make it an nP x ?
matrix, then one can always convert to a vector of vectors. On the other Hand, if we make it a ? x nP
matrix then we can convert into several vectors. Maybe this is a separate discussion and easy to decide later. And in the end unlikely to be a bottleneck... Another advantage will be that it interfaces more nicely with Flux, AD in general.
@andresrossb Was there any reason we went with SVector{Properties}
other than is seemed to fit well into the existing implementation?
The initial reason was that it would "just work" by using SVectors instead. It proved a little more complicated, but I remember there being another reason since I initially wanted to use matrices. I don't remember the reason thought.
ACEflux uses matrices anyways. It changes from svectors to matrices and at the end goes back to svectors. When using optimizers we also are switching them to matrices or vectors before taking a step. For example for optim.jl we input a flattened vector that we convert into an SVector that is then fed into ACEflux. It is then converted into a matrix so that it cooperates with Flux and the derivative is then converted into an SVector which is then converted into a flattened vector for optim to update.
Thank you - so at least trying to rewrite everything for matrices would seem natural to you?
It wouldn't change how ACEflux works, other than a few conversions. I don't see any strong advantages or disadvantages from my side. I would say whatever seems more natural for other applications. We can always convert within the rrules for anything derivative related. I don't know how much of an overhead this would cause, since SVectors are not mutable. If it's not too much, then I don't think it matters.
I do think matrices are more general, so we wouldn't need to implement anything extra, so if there is not strong advantages of SVectors I think these will be easier to deal with in the future. Not only for us, but for new developers and maybe even users. I found SVectors less intuitive and cooperative. Other packages might cooperate better too.
On the other hand SVectors offer a speedup when doing linear algebra, so it could be helpful for our linear models to keep SVectors. We are currently not using this speedup when plugging into optim or flux, but theoretically we could try and leverage it by writing some of these codes our selves, for example the optimiers or dense layers. This could prove to be a lot of work that wouldn't be maintained (since optim and flux are), but maybe the speedup is worth it?
That’s an interesting point.
My guess is we it depends on the order of loops. If the nprop loop is inside this is where we might get that speed up. But we can just make nprop a type parameter and that should automatically unroll the loop.
I think for now I will opt for simplicity...
So the only question left now is whether to make the coefficients nprop x nbasis
or nbasis x nprop
.
This should be a relatively easy change to make later but still good if we can make a sensible choice now.
I now remember why I didn't want to use the Matrix construction. I always wanted the output of a LinearACEModel to be a property and not a vector containing a property. This still bugs me badly and I can only think of hacky ways to get around it with the Matrix implementation.
I'll need to mull this over and play with both options to understand the consequences of this choice.
given the recent decision, I'll close this, but we can re-open if we decide to come back to the alternative PR.
@andresrossb -- I would like us to revisit how we manage multiple properties. To write generic code without lots of duplication, I now believe that
SVector{N, Prop}
was the right choice, but we need to carry this through to gradients; in particular,grad_config
should return aVector{SVector{N, DState}}
. Access would simply change fromg[iX, iP]
tog[iX][iP]
but one can always choose toreinterpret
this to access it like a matrix but it would beg[iP, iX]
.Can you remind me why we decided to go with returning a matrix? If I change this, what is the knock-on effect for you?