JuliaDiff / AbstractDifferentiation.jl

An abstract interface for automatic differentiation.
https://juliadiff.org/AbstractDifferentiation.jl/
MIT License
135 stars 18 forks source link

Include dedicated derivative functions for FiniteDifferences/ForwardDiff instead of relying on jacobians? #87

Closed arthur-bizzi closed 12 months ago

arthur-bizzi commented 1 year ago

Hey all.

As it stands, calling AD.derivative for FiniteDifferences and ForwardDiff back-ends first calculates the jacobian and then flattens it into the derivative. For a few edge cases, say a single-input function, this is significantly slower:

using FiniteDifferences, BenchmarkTools
import AbstractDifferentiation as AD

fdm = central_fdm(2,1,adapt=0)

fd = AD.FiniteDifferencesBackend(fdm)

with_AD(x) = AD.derivative(fd,sin,x)
without_AD(x) = fdm(sin,x)
blame_the_jacobian(x) = jacobian(fdm,sin,x)

@benchmark with_AD(1.)
@benchmark without_AD(1.)
@benchmark blame_the_jacobian(1.)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.070 μs … 552.240 μs  ┊ GC (min … max): 0.00% … 99.30%
 Time  (median):     1.160 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.327 μs ±   5.524 μs  ┊ GC (mean ± σ):  4.13% ±  0.99%

    ▅█
  ▂███▆▄▃▃▃▂▂▂▂▃▂▂▂▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  1.07 μs         Histogram: frequency by time        2.42 μs <

 Memory estimate: 944 bytes, allocs estimate: 17.

BenchmarkTools.Trial: 10000 samples with 961 evaluations.
 Range (min … max):  85.640 ns …  2.145 μs  ┊ GC (min … max): 0.00% … 93.60%
 Time  (median):     88.658 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   97.445 ns ± 46.875 ns  ┊ GC (mean ± σ):  0.98% ±  2.08%

  ▂█▇    ▁▄▅▄▃    ▄     ▁▁▁▁                                  ▁
  ███▇██▆██████▆▆███▄▅▆▇█████▇▆▅▃▄▅▅▄▅▆▆▅▆▇▇█▇▃▄▄▂▅▄▅▄▅▃▄▄▅▃▄ █
  85.6 ns      Histogram: log(frequency) by time       173 ns <

 Memory estimate: 32 bytes, allocs estimate: 2.

BenchmarkTools.Trial: 10000 samples with 111 evaluations.
 Range (min … max):  774.775 ns … 47.623 μs  ┊ GC (min … max): 0.00% … 97.59%
 Time  (median):     819.820 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   950.669 ns ±  1.825 μs  ┊ GC (mean ± σ):  7.82% ±  4.01%

  ▄▇█▆▄▂▂▁▁▁▁  ▃▄    ▁▁                                        ▁
  ██████████████████████▇▆▆▆▆▆▆▆▆▅▆▅▅▆▆▅▆▄▅▅▅▅▄▄▄▄▆▅▄▆▄▄▄▅▅▄▄▅ █
  775 ns        Histogram: log(frequency) by time      1.83 μs <

 Memory estimate: 864 bytes, allocs estimate: 14.

This is also the case for other, less silly examples like small neural networks with a single input. What are the reasons for not implementing the derivative directly? Something along the lines of:

function AD.derivative(ba::AD.FiniteDifferencesBackend, f, xs...)
    return (ba.method(f, xs...),)
end
oxinabox commented 1 year ago

I assume it wasn't done because it worked without it, and noone ran those benchmarks. A PR to implement this would be apprecated

mohamed82008 commented 1 year ago

Yes please submit a PR.

devmotion commented 12 months ago

Fixed by #97.