Threaded Jacobian calls

ChrisRackauckas commented 2 years ago

It would be good to port over some of the PolyesterForwardDiff threading support onto the loops here. It wouldn't be too hard but the caches would need to be extended so they don't overlap.

Pinging @chriselrod for reference.

aarontrowbridge commented 10 months ago

So I'm multithreading the same jacobian function with different inputs and I'm seeing incorrect behavior as compared to running on a single thread. I'm preallocating a single cache for the jacobian. I'm assuming this is also due to cache overlap. Is there a better workaround other than preallocating a different cache for each function call? (possible a different cache on each thread?)

chriselrod commented 10 months ago

Maybe we should consider cacheless forward mode AD. The idea is, if you don't need to mutate x and want derivatives with respect to x, you can add a lazy wrapper that defines getindex to load from x while also initializing the Dual correctly.

aarontrowbridge commented 10 months ago

This is indeed my situation and I'm very interested in this potential solution! I'm parsing compute_jacobian_ad.jl atm; if you can point me to the right place I'm down to work on implementing this :)

chriselrod commented 10 months ago

This is indeed my situation and I'm very interested in this potential solution! I'm parsing compute_jacobian_ad.jl atm; if you can point me to the right place I'm down to work on implementing this :)

FWIW, I just implemented that approach in C++ https://github.com/LoopModels/Math/blob/aab7313183efc5655a001866dc84de43ee73daab/include/Math/Dual.hpp#L190-L275 with a basic test here https://github.com/LoopModels/Math/blob/aab7313183efc5655a001866dc84de43ee73daab/test/dual_test.cpp#L16-L44

I'd suggest ignoring ForwardDiff.jl's code, using only ForwardDiff.Dual. Write your own array wrapper like the DualVector there:

# T is the Dual tag to use
# V is the eltype of the wrapped array
# N is the chunk size
struct DualVector{T,V,N,A<:AbstractVector{V}} <: AbstractVector{ForwardDiff.Dual{T,V,N}}
    data::A
    offset::Int
end
Base.@propagate_inbounds function Base.getindex(A::DualVector{T,V,N}, i::Int) where {T,V,N}
    x = A.data[i]
    # NOTE: if you modify this, avoid capturing `V` in a closure
    # because closures do not specialize on types!
    p = ntuple(V∘==(i-offset), Val(N))
    ForwardDiff.Dual{T}(x, p)
end
# TODO: rest of array interface, e.g. `size`, `IndexStyle`, etc should forward

Basically, you pick a chunk size (N), and then loop over blocks of size N, incrementing the offset (which starts at 0), calling f each time. Each call will give you the partials of the N x result_dimension block of the Jacobian. Copy those from the returned value into your result matrix.

JuliaDiff / SparseDiffTools.jl

Threaded Jacobian calls #174