JuliaDiff / DifferentiationInterface.jl

An interface to various automatic differentiation backends in Julia.
https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface
MIT License
186 stars 13 forks source link

Error in ForwardDiff tagging #594

Open aml5600 opened 1 week ago

aml5600 commented 1 week ago

I have an NLP that I am solving through Optimization.jl using either Ipopt or MadNLP as the solver. Both have previously worked. It seems now that I get an error from ForwardDiff about mismatched tags.

I noticed that custom tagging was introduced in 0.6.11. If I pin to 0.6.10, I do not see the issue. I do not have the full stacktrace on this machine, but can type out the top line where the tagged function and args differ. Also, my underlying functions are a mix of custom transcription over top of MTK models, so the type information is incredibly long. I try to group it into ....

The error is here, where

FT = DifferentiationInterface.FixTail{OptimizationBase.var"#lagrangian#28"}{...}, Tuple{Float64, Vector{Float64}, SciMLBase.NullParameters}}

but the provided function f::F is

f::DifferentiationInterface.FixTail{OptimizationBase.var"#lagrangian#28"}{...}, Tuple{Float64, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, SciMLBase.NullParameters}}

where we can see the differing Vector vs SubArray in the Tuples.

The tagged input type, and provided to check_tag, are identical, and begin like:

ForwardDiff.Dual{ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{...}}

I apologize for the limited information here. If the issue is obvious, wonderful! If not, I can try to work on extracting a MWE as this issue pops up in a much larger process. For now I will pin to 0.6.10. Thanks!

gdalle commented 1 week ago

Hi, thanks for reporting this! It's hard to debut without an MWE but my best guess here is that Optimization.jl might be slightly abusing DI's preparation mechanism. Preparation is only valid if you end up calling the differentiation operator with inputs of the same type later on. This also applies to additional arguments like constants. In this case, it looks like preparation is performed with a Constant containing a vector (which shows up in the tag because the function becomes a closure), but then the Lagrangian function is differentiated with a Constant containing a vector view, while reusing the wrongly tagged preparation result. Does that sound plausible @Vaibhavdixit02?

aml5600 commented 1 week ago

I am definitely using views in my problem, so I can see how that would be entering into the call/evaluation of the prepared differentiation operator.

I haven't fully familiarized myself with all the layers/abstractions of Optimization.jl, but it does feel like somewhere in the prep is assuming a Vector type then?

For my own knowledge, pinning to 0.6.10 works because there is no tagging at all? I assume tagging provides performance/type stability benefits?

Thanks again!

gdalle commented 1 week ago

I haven't fully familiarized myself with all the layers/abstractions of Optimization.jl, but it does feel like somewhere in the prep is assuming a Vector type then?

I don't think there is such an explicit constraint, but it could be a more subtle bug. For instance, when preparing some differentiation operator, Optimization.jl could make a copy of the provided arguments, to avoid unwanted aliasing. But a copy of a view does not preserve the wrapper.

julia> x = view(ones(10), 1:2)
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 1.0
 1.0

julia> copy(x)
2-element Vector{Float64}:
 1.0
 1.0

We would need @Vaibhavdixit02's help and/or an MWE from you to seek the root cause.

For my own knowledge, pinning to 0.6.10 works because there is no tagging at all?

More precisely, because I handled tagging incorrectly before #571, accidentally omitting custom tags provided by the backend.

I assume tagging provides performance/type stability benefits?

Not really, it's mostly useful to untangle derivatives in higher-order autodiff when you have dual numbers containing dual numbers. See this section of the docs for more information.

Normally, users should never have to set tags manually or even be aware of their existence. The only reason you're seeing this is because of a hacky fix I put together for ForwardDiff's hvp implementation. Essentially, tinkering with the tag allows me to know the full type of the Dual numbers that will be passed to the inner gradient, which in turn enables preparation of said gradient.

aml5600 commented 1 week ago

Thanks for all the details! I can work on pulling out a representative example of what I am seeing. @Vaibhavdixit02 any hints on where to focus digging would definitely be appreciated.

On the tagging side, given the example you have above, is it possible to tag on a higher-level/more abstract type than Vector?

ElOceanografo commented 1 week ago

I've also run into an issue with ForwardDiff tagging in MarginalLogDensities.jl. Here's an example extracted from my test suite that produces the error:

using Distributions
using MarginalLogDensities
using ComponentArrays
using LinearAlgebra
using Optimization, OptimizationOptimJL

N = 3
μ = ones(N)
σ = 1.5
d = MvNormal(μ, σ^2 * I)
ld(u, p) = logpdf(d, u)
iw = [1, 3]
iv = [2]
dmarginal = Normal(1.0, σ)
u = randn(N)
v = u[iv]
w = u[iw]
u_component = ComponentArray(v = v, w = w)

x = 1.0:3.0
x_component = ComponentVector(v = x[iv], w = x[iw])
mld = MarginalLogDensity(ld, u, iw, (), LaplaceApprox())
mldc = MarginalLogDensity(ld, u_component, [:w], (), LaplaceApprox())
mld(x[iv], ())
mldc(x_component[[:v]], ())
Error message ```julia ERROR: Invalid Tag object: Expected ForwardDiff.Tag{DifferentiationInterface.FixTail{MarginalLogDensities.var"#f#4"{typeof(ld), ComponentVector{Float64, Vector{Float64}, Tuple{Axis{(v = 1:1, w = 2:3)}}}, Vector{Symbol}, Vector{Symbol}}, Tuple{@NamedTuple{p::Tuple{}, v::Vector{Float64}}}}, ForwardDiff.Dual{ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{MarginalLogDensities.var"#f#4"{typeof(ld), ComponentVector{Float64, Vector{Float64}, Tuple{Axis{(v = 1:1, w = 2:3)}}}, Vector{Symbol}, Vector{Symbol}}}, Float64}, Float64, 1}}, Observed ForwardDiff.Tag{DifferentiationInterface.FixTail{MarginalLogDensities.var"#f#4"{typeof(ld), ComponentVector{Float64, Vector{Float64}, Tuple{Axis{(v = 1:1, w = 2:3)}}}, Vector{Symbol}, Vector{Symbol}}, Tuple{@NamedTuple{p::Tuple{}, v::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{(v = 1:1,)}}}}}}, ForwardDiff.Dual{ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{MarginalLogDensities.var"#f#4"{typeof(ld), ComponentVector{Float64, Vector{Float64}, Tuple{Axis{(v = 1:1, w = 2:3)}}}, Vector{Symbol}, Vector{Symbol}}}, Float64}, Float64, 1}}. Stacktrace: [1] checktag(::Type{…}, f::DifferentiationInterface.FixTail{…}, x::ComponentVector{…}) @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/config.jl:34 [2] gradient(f::DifferentiationInterface.FixTail{…}, x::ComponentVector{…}, cfg::ForwardDiff.GradientConfig{…}, ::Val{…}) @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:17 [3] gradient(f::DifferentiationInterface.FixTail{…}, x::ComponentVector{…}, cfg::ForwardDiff.GradientConfig{…}) @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:17 [4] gradient(f::MarginalLogDensities.var"#f#4"{…}, prep::DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{…}, ::AutoForwardDiff{…}, x::ComponentVector{…}, contexts::Constant{…}) @ DifferentiationInterfaceForwardDiffExt ~/.julia/packages/DifferentiationInterface/tBGja/ext/DifferentiationInterfaceForwardDiffExt/onearg.jl:337 [5] (::DifferentiationInterfaceForwardDiffExt.var"#inner_gradient#9"{…})(x::ComponentVector{…}, unannotated_contexts::@NamedTuple{…}) @ DifferentiationInterfaceForwardDiffExt ~/.julia/packages/DifferentiationInterface/tBGja/ext/DifferentiationInterfaceForwardDiffExt/secondorder.jl:38 [6] compute_ydual_onearg(f::DifferentiationInterfaceForwardDiffExt.var"#inner_gradient#9"{…}, prep::DifferentiationInterfaceForwardDiffExt.ForwardDiffOneArgPushforwardPrep{…}, x::ComponentVector{…}, tx::Tuple{…}, contexts::Constant{…}) @ DifferentiationInterfaceForwardDiffExt ~/.julia/packages/DifferentiationInterface/tBGja/ext/DifferentiationInterfaceForwardDiffExt/onearg.jl:98 [7] pushforward!(f::DifferentiationInterfaceForwardDiffExt.var"#inner_gradient#9"{…}, ty::Tuple{…}, prep::DifferentiationInterfaceForwardDiffExt.ForwardDiffOneArgPushforwardPrep{…}, ::AutoForwardDiff{…}, x::ComponentVector{…}, tx::Tuple{…}, contexts::Constant{…}) @ DifferentiationInterfaceForwardDiffExt ~/.julia/packages/DifferentiationInterface/tBGja/ext/DifferentiationInterfaceForwardDiffExt/onearg.jl:153 [8] hvp!(f::OptimizationFunction{…}, tg::Tuple{…}, prep::DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPPrep{…}, ::SecondOrder{…}, x::ComponentVector{…}, tx::Tuple{…}, contexts::Constant{…}) @ DifferentiationInterfaceForwardDiffExt ~/.julia/packages/DifferentiationInterface/tBGja/ext/DifferentiationInterfaceForwardDiffExt/secondorder.jl:72 [9] hessian! @ ~/.julia/packages/DifferentiationInterface/tBGja/ext/DifferentiationInterfaceSparseMatrixColoringsExt/hessian.jl:97 [inlined] [10] modal_hessian! @ ~/.julia/dev/MarginalLogDensities/src/MarginalLogDensities.jl:288 [inlined] [11] _marginalize(mld::MarginalLogDensity{…}, v::ComponentVector{…}, data::Tuple{}, method::LaplaceApprox{…}, verbose::Bool) @ MarginalLogDensities ~/.julia/dev/MarginalLogDensities/src/MarginalLogDensities.jl:297 [12] #_#5 @ ~/.julia/dev/MarginalLogDensities/src/MarginalLogDensities.jl:226 [inlined] [13] (::MarginalLogDensity{…})(v::ComponentVector{…}, data::Tuple{}) @ MarginalLogDensities ~/.julia/dev/MarginalLogDensities/src/MarginalLogDensities.jl:225 [14] top-level scope @ Untitled-1:24 Some type information was truncated. Use `show(err)` to see complete types. ```

I have recently been making changes to support ComponentArrays as inputs for MarginalLogDensities, and that is part of the problem: the two last lines are calling the same model, but the one with an ordinary vector as input runs fine, while the one with a ComponentVector errors. This works with DI v0.6.13 - v0.6.15, but breaks with v0.6.16.

I will try to simplify this and make a more minimal example, but hopefully it's a starting point...

gdalle commented 1 week ago

Thanks for the MWE!

The rules of DI preparation are: you can only reuse prep if all the arguments still have the same type and sizes. This holds for contexts as well, even though I don't say it explicitly in the docs (PR incoming).

In both cases, either your code or Optimization.jl must be breaking these rules due to subtle array conversions, which imply that the preparation result is no longer valid. Not sure what happens for @aml5600, but at least I figured it out for @ElOceanografo. You're accidentally converting a ComponentVector into a Vector when you call collect on line 293 of MarginalLogDensities.jl:

julia> v = ComponentVector(a=[1, 2])
ComponentVector{Int64}(a = [1, 2])

julia> collect(v)
2-element Vector{Int64}:
 1
 2

Your hessian operator was prepared with a Constant(p2) where p2 = (; p, v::ComponentVector), but then you try to reuse the preparation rewsult with p2 = (; p, v::Vector). In general this should not be expected to work with DI, and ForwardDiff just happens to throw an exception whereas other backends might have let it slide. This behavior only appeared in v0.6.16 because I relaxed the rewrapping mechanism in #590: it used to perform a more stringent conversion which implicitly fixed your misuse. See https://github.com/ElOceanografo/MarginalLogDensities.jl/pull/33 to discuss fixes.

As for the question of @aml5600, more details below

On the tagging side, given the example you have above, is it possible to tag on a higher-level/more abstract type than Vector?

ForwardDiff's preparation system is probably the most sensitive backend to the "same type" rule because the tag type contains the type of the function f and the element type of x. But when the function takes contexts, I replace f with ft = FixTail(f, contexts...) under the hood, which looks a bit like this:

FixTail(f, contexts...) = x -> f(x, contexts...)

This function wrapper ft has a type which contains the type of the contexts. When I create a ForwardDiff tag for ft, the tag type T contains the type of ft, which itself contains the type of the contexts.

So the answer is that, at least with the current implementation the tag will contain concrete types for all the context arguments. We could make a case for relaxing this constraint, but then this would prevent us from catching bugs such as @ElOceanografo's implicit conversion.

Vaibhavdixit02 commented 3 days ago

Hey, sorry for being MIA here was traveling etc. This is happening because Ipopt sends a view to the hessian call of the lagrangian and in preparation I use a regular vector. One workaround would be to instantiate the view which might not be too terrible or try to create a separate preparation with a view-ed argument and then have a dispatch for lag_h! based on that type, first might be more expensive than it needs to be and the second would be very tedious, so I am leaning towards the first but happy to take suggestions

gdalle commented 3 days ago

Is it the only solver for which this happens? Maybe you could instantiate the Lagrange multiplier once and for all then copy the view into it at every iteration? I tried relaxing the constraints on tagging but it's harder than it looks, and it wouldn't change the fact that preparing with a type and then calling with another type is fundamentally incorrect from DI's perspective. Even if we stop one backend from complaining doesn't mean another one won't.

Vaibhavdixit02 commented 1 day ago

Yeah, among the frequently used optimizers that seems to be only one. Though just to be clear it's a well-motivated use of @view there.

Maybe you could instantiate the Lagrange multiplier once and for all then copy the view into it at every iteration

Yeah this is a good idea, I'll do this.

gdalle commented 22 hours ago

Also note that ForwardDiff over ReverseDiff is very slow once you add constants because preparation becomes impossible (I can't allow the constants to be absorbed in the tape cause their value might change afterwards)