ACEsuit / Polynomials4ML.jl

Polynomials for ML: fast evaluation, batching, differentiation
MIT License
12 stars 5 forks source link

Simpler Static Product #50

Closed cortner closed 1 year ago

cortner commented 1 year ago

This is a suggestion to replace the static product implementation with a recursive code. It is much simpler, because it avoids all the code generation, and it is only marginally slower. This is basically because the compiler will fully resolve and inline the recursion for us.

The old implementation has a lot of additional functionality that the new code doesn't yet provide. So I'm open to keeping both. But for the sake of cleanliness we should also consider replacing the old implementation entirely.

CheukHinHoJerry commented 1 year ago

I will take a closer look at it tmr.

CheukHinHoJerry commented 1 year ago

I did a quick btime and it seems that the new code is even faster than the previous one?? Or did I do something wrong?

ORD = 5
valNB = Val(ORD) # 100 ns
@btime P4ML._prod_ed($b.data, $valNB) # 120.00 ns
@btime P4ML._grad_static_prod($b.data) # 116.54 ns

Both of them eval and diff at the same time.

cortner commented 1 year ago

Interesting- maybe my previous comparison was against only grad - or I changes something else. Not sure...

cortner commented 1 year ago

If you like this, we can label the PR as WIP, bring it to feature parity and remove the old code. But as I said above I'm also happy to keep both for the time being and only experiment with the new static product in the model components that I'm currently extending or rewriting.

CheukHinHoJerry commented 1 year ago

I like this more than the previous one and I guess this would actually be even faster than before.

There is also a ‘prod_ed2’ functionality that the new implementation needs. After we work that out I can do a quick btime and replace all the old static prod with the new one to see what we lose/gain.

I suggest to work everything out in this PR if you are not in urgent merging that?

cortner commented 1 year ago

ok for me. Please feel free to push to it. I won't need anything else from it at the moment so won't edit it further.

CheukHinHoJerry commented 1 year ago

Sure. Thank you.

CheukHinHoJerry commented 1 year ago
@btime evaluate_ed!(bA, bdA, basis, bBB) # 1.151 μs
@btime evaluate_ed!(bA, bdA, basis2, bBB) # 1.156 μs

This is the sparseproduct btime. Basically no difference I guess since most of the time are used for memory access/overwrite. I suggest simply replace the previous product with the new one and put the old one in backup in case we need that later. Sounds good? @cortner

cortner commented 1 year ago

I'm ok with that. Thanks for testing this.

cortner commented 1 year ago

I wonder whether _grad_static_prod should be renames _static_prod_ed -- that would be more consistent with evaluate_ed??

CheukHinHoJerry commented 1 year ago

Agree. I will do that now. Another problem is that there is a _code_prod_ed2 function that I haven't figure out its recursion yet. I will use the previous one for now. That function is only used in frule of sparseproduct which I think we don't want to focus on now.

cortner commented 1 year ago

Can't you just forwarddiff the gradient?

CheukHinHoJerry commented 1 year ago

Sorry I am a bit confused about what was implemented ... Neither frule of sparseproduct or _code_prod_ed2 is tested and I expect the second order derivatives are all zero?

cortner commented 1 year ago

I don't understand your last comment. I hadn't planned to implement frules for _static_prod at all. By "forwarddiff" I just meant to use ForwardDiff or even directly evaluate _static_prod_ed using a chunk of duals.

but if this is not needed right now, then I would prefer we document, merge and write an issue to document which methods are missing.

I expect the second order derivatives are all zero?

why?

CheukHinHoJerry commented 1 year ago

What _code_prod_ed2 is in my mind is the second derivative of $x_1 x_2 ... x_N$ with respect to each $x_i$ and they are all zeros? Please correct me if I am wrong...

And yes I did the renaming and we can just do the merge.

cortner commented 1 year ago

I see - no I think it should be the hessian

CheukHinHoJerry commented 1 year ago

That's why... Sorry for misunderstanding that. The previous implementation uses the fact that the Hessian is symmetric and it only stores the lower triangular entries. I tried with ForwardDiff and it is much slower than the current version.

But I think we don't need this now so we can just merge as you suggested?

cortner commented 1 year ago

ok - will do.

cortner commented 1 year ago

merging now so I can finalize the next PR - if any problems remain we can catch them there.