JuliaDiff / TaylorDiff.jl

Taylor-mode automatic differentiation for higher-order derivatives
https://juliadiff.org/TaylorDiff.jl/
MIT License
73 stars 8 forks source link

Performance of `(+,-,*,/)(::TaylorScalar,::Number)`: easy short path ? #25

Closed lrnv closed 2 weeks ago

lrnv commented 1 year ago

Hey,

The current implementation of *(::TaylorScalar,::Number) in your package promotes the scalar to a TaylorScalar as so :

for op in (:+, :-, :*, :/)
    @eval @inline $op(a::TaylorScalar, b::Number) = $op(promote(a, b)...)
    @eval @inline $op(a::Number, b::TaylorScalar) = $op(promote(a, b)...)
end

and then uses the full blown multiplication of taylorScalars:

@generated function *(a::TaylorScalar{T, N}, b::TaylorScalar{T, N}) where {T, N}
    return quote
        va, vb = value(a), value(b)
        @inbounds TaylorScalar($([:(+($([:($(binomial(i - 1, j - 1)) * va[$j] *
                                           vb[$(i + 1 - j)]) for j in 1:i]...)))
                                  for i in 1:N]...))
    end
end

Although very general and elegant, this seems super wasteful as a bunch of multiplications by 0 will occur and still take runtime.

t = TaylorScalar(rand(20)...)
x = rand(10)
@btime Ref(t) .* x; # 922 ns
@btime (TaylorScalar(t.value * xi) for xi in x) # 58 ns

and so a specific function would be nice. Same thing occurs for +,-,/ if i read your code correclty.

As i am not well versed in the @eval shananigans, the code I produced is quite ugly and has non-ncessary loops, but I could not fix that myself ;)

Please tell me what you think, I got a 40% speedup in my application with this.

codecov-commenter commented 1 year ago

Codecov Report

Base: 86.44% // Head: 86.81% // Increases project coverage by +0.36% :tada:

Coverage data is based on head (280bb66) compared to base (7b2d2d6). Patch coverage: 100.00% of modified lines in pull request are covered.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #25 +/- ## ========================================== + Coverage 86.44% 86.81% +0.36% ========================================== Files 5 5 Lines 214 220 +6 ========================================== + Hits 185 191 +6 Misses 29 29 ``` | [Impacted Files](https://codecov.io/gh/JuliaDiff/TaylorDiff.jl/pull/25?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaDiff) | Coverage Δ | | |---|---|---| | [src/scalar.jl](https://codecov.io/gh/JuliaDiff/TaylorDiff.jl/pull/25?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaDiff#diff-c3JjL3NjYWxhci5qbA==) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaDiff). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaDiff)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

tansongchen commented 1 year ago

Thanks for your contribution. Yes I agree this will be a lot more efficient, I just haven't got a chance to address that. If you take a look at our continuous benchmark page (on the page, navigate to your branch first), you will notice that the performance for single variable Taylor expansion case improved a lot.

There is a caveat, though: the performance for PINN regressed. That's because Zygote is too weak to infer type through your [2:end] semantics. But you probably don't want to dig into the Zygote compatibility layer (which is a huge mess 😱 ) so I will fix that part for you.

lrnv commented 1 year ago

I'm sorry i cleaned up the work and forced-pushed onto my branch, so if you want to pickup from there you can of course.

Edit : Yes indeed your benchmarks shows a nice improvement ! This is super cool stuff you got there with these benchmarks btw, having them real time on PRs and brnahces is neat !

tansongchen commented 1 year ago

Yeah that is build up on PkgBenchmark.jl and my own little frontend, still WIP and buggy but feel free to take a look and play around

lrnv commented 1 year ago

Hey

I added convenience functions to be able to multiply and/or divide TaylorScalar{T,N}'s with different Ts. This was missing and is very helpfull for me.

Indeed, I need to derivate through TaylorScalar's machinery: So I end up with types of the form TaylorScalar{ForwardDiff{...},N}.

Btw, is there a gradient function planned or just the derivative ?

ToucheSir commented 1 year ago

That's because Zygote is too weak to infer type through your [2:end] semantics

Instead of tuple slicing, you can use Base.tail. ntuple(i -> mytup[i + 1], length(mytup) - 1). It's uglier but is more likely to be type stable. We use this pattern across a bunch of FluxML code. It should be possible to improve type stability of tuple indexing in Zygote, but the reality is that anything which isn't a bugfix is low priority because of limited dev resources.

tansongchen commented 1 year ago

Thanks @ToucheSir , that sounds like a quick fix, maybe @lrnv try that?

tansongchen commented 1 year ago

Hey

I added convenience functions to be able to multiply and/or divide TaylorScalar{T,N}'s with different Ts. This was missing and is very helpfull for me.

Indeed, I need to derivate through TaylorScalar's machinery: So I end up with types of the form TaylorScalar{ForwardDiff{...},N}.

Btw, is there a gradient function planned or just the derivative ?

And regarding your gradient question, depends on what you mean:

  1. gradient(f, xs, n) will compute the n-th partial derivative for each x in xs, e.g. compute f_xx and f_yy at once. This is easy to implement (like a convenience wrapper), and indeed made some usage cases easier.
  2. gradient(f, xs, n) will compute the n-th Hessian-like tensor (which is a n-dimensional tensor); I didn't see a real use case for that, and it isn't really something that Taylor-mode is good at. Taylor-mode is very good for directional derivatives, and probably also good for nesting directional derivatives for 2 ~ 3 times (like f_xxyyy), but not for the full derivative tensor
tansongchen commented 1 year ago

BTW, you can use julia -e 'using JuliaFormatter; format(".", verbose=true)' to format your code, as shown in https://github.com/JuliaDiff/TaylorDiff.jl/blob/main/.github/workflows/FormatCheck.yml

tansongchen commented 1 year ago

Changing slicing to tail didn't fix the problem... I need to take a closer look next week

lrnv commented 1 year ago

Hey sorry i did not catch up on your responses, i did not see them...

tansongchen commented 4 months ago

I will be working on this during the summer, however with a different approach. I am currently adding chunking support to this package like ForwardDiff.jl, something like this

struct TaylorScalar{V, N, K}
  primal::V
  partials::NTuple{K, NTuple{N, V}}
end

where K is the order, and N is the chunk size as in ForwardDiff.jl. partials will be a polynomial consists of tuples, not numbers.

Based on this, we have easy access to the primal value via getfield, so won't face the same issue as we currently have.

lrnv commented 4 months ago

Thanks for feedback @tansongchen. For me the real issue is #61 and until this is dealt with I cannot switch from TaylorSeries.jl to TaylorDiff.jl to benefit from the (already better) performance, let alone think of even better perf from this PR..

lrnv commented 2 weeks ago

closed in favor of #81