Closed ExpandingMan closed 1 month ago
As you say chunk size = length does seem to be the present behaviour, ForwardDiff.gradient(x -> (@show x[1]), @SArray rand(1,100))
.
If you explicitly make a GradientConfig
it seems that obeys DEFAULT_CHUNK_THRESHOLD = 12, but this is ignored once you pass it to gradient
? That's odd:
julia> f1(x) = @show x[1];
julia> ForwardDiff.Chunk(@SMatrix rand(1,100))
ForwardDiff.Chunk{12}()
julia> ForwardDiff.GradientConfig(f1, @SMatrix rand(1,100)).seeds
(Partials(1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0))
julia> ForwardDiff.gradient(f1, @SArray(rand(1,100)), ForwardDiff.GradientConfig(f1, @SArray rand(1,100)))
x[1] = Dual{ForwardDiff.Tag{typeof(f1), Float64}}(0.6424264074844214,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0)
1×100 SMatrix{1, 100, Float64, 100} with indices SOneTo(1)×SOneTo(100):
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Using chunk size 100 might be sub-optimal. I would guess that whatever compiler limitations StaticArrays runs into for long arrays likely apply to long tuples of partials too, so this thing with 10^4
numbers might be awful? Haven't timed it and perhaps picking a better heuristic isn't this PR's problem anyway.
I have attempted an alternate scheme based partly on @avik-pal 's suggestion. It seems to work when I test it locally, though it is certainly rather precarious and scary looking as it assumes inference can follow a bunch of logic without breaking. I think what I have here now would be considered non-breaking though.
For whatever it's worth, were I to do this from scratch, I'd probably split Chunk
into two different objects. The first would be a mere struct Chunk size::Int; end
which would be used for ordinary arrays and the latter would be like the current Chunk{N}
which could be used with static arrays or when a user desires. In the standard array case it does not seem justified to think the type Chunk{N}
should be inferrable so I'm not sure what business it has having a type parameter.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.59%. Comparing base (
c310fb5
) to head (a4b3e4e
). Report is 7 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
For me, the following diff is enough:
-const CHUNKS = [Chunk{i}() for i in 1:DEFAULT_CHUNK_THRESHOLD]
-
function Chunk(input_length::Integer, threshold::Integer = DEFAULT_CHUNK_THRESHOLD)
N = pickchunksize(input_length, threshold)
- 0 < N <= DEFAULT_CHUNK_THRESHOLD && return CHUNKS[N]
+ 0 < N <= DEFAULT_CHUNK_THRESHOLD && return Chunk{N}()
return Chunk{N}()
end
julia> @inferred ForwardDiff.GradientConfig(f, sx);
The reason the const CHUNKS
is there is historical where Julia was bad at optimizing a certain thing (https://github.com/JuliaLang/julia/issues/29887) but I don't think it should be needed now.
This should be benchmarked again with this change though: https://github.com/JuliaDiff/ForwardDiff.jl/pull/373
Made an alternative PR at https://github.com/JuliaDiff/ForwardDiff.jl/pull/708
FYI @ExpandingMan the idea of DifferentiationInterface is to prepare the operator once and then reuse the preparation several times. For gradient and Jacobian, the preparation phase does involve a type-unstable code to Chunk, but then the resulting extras object allows for type-stable application of the operator. In this spirit, it's okay for preparation to be type unstable (and currently it has to be with ForwardDiff), but the code you put in hot loops should be the prepared version of the operator. See the tutorial at https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface/stable/tutorial1/#Preparing-for-multiple-gradients for an example.
Thanks @KristofferC , I have confirmed that #708 works, so that seems obviously preferable to this, I will close this.
The function
Chunk(x)
is manifestly type unstable. In many cases when the value is not directly used, the compiler is clever enough to elide this, but in some cases its mere presence can cause significant performance penalties.In particular, I encountered this when trying to get
DifferentiationInterface.jl
to behave nicely with static arrays. That package callsGradientConfig
(orJacobianConfig
, etc) and for whatever reason the compiler gets upset about the calls toChunk
and the whole thing is wildly inefficient.This PR attempts to address that issue by defining a
StaticArray
method forChunk
and setting the size to the arrays length. It is not entirely clear to me that this is the desired behavior, butdualize
already seems to do this. Also this ignores the global chunk size limit setting, but again,dualize
appears to already do this.Note that I have explicitly confirmed that the
@inferred
tests I have added fail without this PR but succeed with it.(btw I am getting a failure in one of the tests for allocations during
seed!
when I run on my end, but it looks totally unrelated to this so I'm not sure what's going on there. Maybe it'll pass in CI/CD, who knows)