Closed cortner closed 1 year ago
I will take a closer look at it tmr.
I did a quick btime and it seems that the new code is even faster than the previous one?? Or did I do something wrong?
ORD = 5
valNB = Val(ORD) # 100 ns
@btime P4ML._prod_ed($b.data, $valNB) # 120.00 ns
@btime P4ML._grad_static_prod($b.data) # 116.54 ns
Both of them eval and diff at the same time.
Interesting- maybe my previous comparison was against only grad - or I changes something else. Not sure...
If you like this, we can label the PR as WIP, bring it to feature parity and remove the old code. But as I said above I'm also happy to keep both for the time being and only experiment with the new static product in the model components that I'm currently extending or rewriting.
I like this more than the previous one and I guess this would actually be even faster than before.
There is also a ‘prod_ed2’ functionality that the new implementation needs. After we work that out I can do a quick btime and replace all the old static prod with the new one to see what we lose/gain.
I suggest to work everything out in this PR if you are not in urgent merging that?
ok for me. Please feel free to push to it. I won't need anything else from it at the moment so won't edit it further.
Sure. Thank you.
@btime evaluate_ed!(bA, bdA, basis, bBB) # 1.151 μs
@btime evaluate_ed!(bA, bdA, basis2, bBB) # 1.156 μs
This is the sparseproduct
btime. Basically no difference I guess since most of the time are used for memory access/overwrite. I suggest simply replace the previous product with the new one and put the old one in backup in case we need that later. Sounds good? @cortner
I'm ok with that. Thanks for testing this.
I wonder whether _grad_static_prod
should be renames _static_prod_ed
-- that would be more consistent with evaluate_ed
??
Agree. I will do that now. Another problem is that there is a _code_prod_ed2
function that I haven't figure out its recursion yet. I will use the previous one for now. That function is only used in frule
of sparseproduct
which I think we don't want to focus on now.
Can't you just forwarddiff the gradient?
Sorry I am a bit confused about what was implemented ... Neither frule
of sparseproduct
or _code_prod_ed2
is tested and I expect the second order derivatives are all zero?
I don't understand your last comment. I hadn't planned to implement frules for _static_prod
at all. By "forwarddiff" I just meant to use ForwardDiff
or even directly evaluate _static_prod_ed
using a chunk of duals.
but if this is not needed right now, then I would prefer we document, merge and write an issue to document which methods are missing.
I expect the second order derivatives are all zero?
why?
What _code_prod_ed2
is in my mind is the second derivative of $x_1 x_2 ... x_N$ with respect to each $x_i$ and they are all zeros? Please correct me if I am wrong...
And yes I did the renaming and we can just do the merge.
I see - no I think it should be the hessian
That's why... Sorry for misunderstanding that. The previous implementation uses the fact that the Hessian is symmetric and it only stores the lower triangular entries. I tried with ForwardDiff
and it is much slower than the current version.
But I think we don't need this now so we can just merge as you suggested?
ok - will do.
merging now so I can finalize the next PR - if any problems remain we can catch them there.
This is a suggestion to replace the static product implementation with a recursive code. It is much simpler, because it avoids all the code generation, and it is only marginally slower. This is basically because the compiler will fully resolve and inline the recursion for us.
The old implementation has a lot of additional functionality that the new code doesn't yet provide. So I'm open to keeping both. But for the sake of cleanliness we should also consider replacing the old implementation entirely.