Open abavoil opened 3 months ago
Hi! Can you provide a complete MWE with all necessary packages, problem definition and error stacktrace?
@abavoil
I can quickly add an option backend = :dense
such that when you can create an ADNLPModel with the previous default behavior.
model = ADNLPModel!(x -> obj(x, p), x, lb, ub, (c, x) -> cons!(c, x, p), lc, uc, backend = :dense)
Here is a MWE for OrdinaryDiffEq. I use the optimisation variable in different places, and I get a few different errors as a result.
using Pkg
Pkg.activate(; temp=true)
Pkg.add(["OrdinaryDiffEq", "ADNLPModels"])
using OrdinaryDiffEq, ADNLPModels
"""x[1] used as a global variable in the dynamics"""
function cons1!(out, x)
solve(ODEProblem((u, p, t) -> [x[1]], [0.], 1))
return out .= zero(x)
end
"""x[1] used in the initial condition"""
function cons2!(out, x)
solve(ODEProblem((u, p, t) -> zero(u), [x[1]], 1))
return out .= zero(x)
end
"""x[1] used in the tspan"""
function cons3!(out, x)
solve(ODEProblem((u, p, t) -> zero(u), [0.], x[1]))
return out .= zero(x)
end
"""x[1] used in the parameters"""
function cons4!(out, x)
solve(ODEProblem((u, p, t) -> zero(u), [0.], 1, x[1]))
return out .= zero(x)
end
for c! in [cons1!, cons2!, cons3!, cons4!]
c!(zeros(1), zeros(1)) # OK
try ADNLPModel!(x -> zeros(1), zeros(1), c!, zeros(1), zeros(1)) catch e println(e) end
end
"""
All at once:
- x[1] used as a global variable in the dynamics
- x[2] used in the initial condition
- x[3] used in the tspan
- x[4] used in the parameters"""
function cons!(out, x)
solve(ODEProblem((u, p, t) -> p .* x[1] .* one.(u), [x[2]], x[3], x[4]))
return out .= zero(x)
end
cons!(zeros(4), zeros(4)) # OK
try ADNLPModel!(x -> zeros(1), zeros(4), cons!, zeros(4), zeros(4)) catch e println(e) end
Prior to 0.8, these examples raised no error and behaved as expected.
Thanks!
First of all, the objective function in ADNLPModel!
should be scalar-valued, not vector-valued. Fixing that makes the cons4!
case work. As for the others, here are the stacktraces:
For context, the new sparsity detection mechanism uses SparseConnectivityTracer.jl, developed by @adrhill and myself. It works by operator overloading using a new Number
subtype called Tracer
, and this type "remembers" which inputs have influenced it. Our Tracer
is a bit more restrictive than usual numbers, because it doesn't support comparisons or things like isnan
, which explains the errors we get for cons2!
and cons3!
(replacing [0.]
with [zero(eltype(x))]
makes cons1!
behave the same as cons2!
), unless you switch to a local tracer.
If (and only if) you are 100% confident that the sparsity pattern of your constraints and objective is independent from the input value, you can try passing TracerLocalSparsityDetector()
instead of TracerSparsityDetector()
to the ADNLPModels.jl sparsity detection pipeline (see this doc page). In that case, I suggest picking a random x0
instead of zeros(n)
to avoid accidental zeros.
In the long run, some fixes might be possible on our end, like:
isnan(::Tracer)
reltol
has type ::Tracer
in the diffeq@abavoil It seems that what I wanted to do is already implemented:
model = ADNLPModel!(x -> obj(x, p), x, lb, ub, (c, x) -> cons!(c, x, p), lc, uc, backend = :generic)
Can you try it?
Can you try it?
Building the model works and solving it gives the numerical result I had in v0.7.3! In predefined_backends.jl I saw those GenericForwardDiffAD...
when I was looking for ForwardDiffAD...
and thought generic
was some kind of abstract backend without even trying... Thank you!
I agree that the name is misleadling. I will simplify that with the next major release and the support of DifferentiationInterface.jl.
I'm glad that backend = :generic
solved your issue. I should add a note about it in the documentation.
Hi @gdalle @amontoison, many thanks for the feedback. Regarding this point:
It works by operator overloading using a new Number subtype called Tracer, and this type "remembers" which inputs have influenced it. Our Tracer is a bit more restrictive than usual numbers, because it doesn't support comparisons or things like isnan, which explains the errors we get for cons2! and cons3! (replacing [0.] with [zero(eltype(x))] makes cons1! behave the same as cons2!), unless you switch to a local tracer.
I think that the use case raised by @abavoil is rather common: optimisation model that calls a solver (here an ODE solver) to evaluate the objective and / or the constraints. Dual numbers typically pass through this seamlessly using mere overloading [^1], which does not seem to be the case for your specific type. Worth looking at it (although @abavoil case is low dim, dense, and solved by :generic
).
[^1]: even if it is not the proper way to compute the sensitivity for an ODE, but that's another story
Dual numbers typically pass through this seamlessly using mere overloading
Well that's not entirely true: OrdinaryDiffEq and plenty of other packages are designed specifically to work with ForwardDiff. You can deduce it from the number of occurrences of ForwardDiff.something
in the code base, or the existence of auxiliary packages like PreallocationTools which are mentioned in the FAQ. So there are indeed places in which overloading alone does not suffice. In such places, people have usually gone out of their way to support Dual
numbers, but less so for our very recent Tracer
s.
Worth looking at it
We are looking at it, but some of the limitations I mentioned are intrinsic to sparsity detection. We want to deduce a sparsity pattern that is "global", meaning "independent of the point at which you evaluate the function". From there, it follows that a Tracer
number should not carry a specific value. And in turn this means comparisons like x < y
or checks like isnan(x)
are likely to error.
There are sometimes workarounds, like using ifelse(x < y, a, b)
instead of if x < y; a; else; b; end
(because this is more compatible with operator overloading), and we could probably define isnan(::Tracer) = false
without the sky falling down on us. But fundamentally, global Tracer
s will always be limited in what they can do because they represent a function over its whole input domain.
On the other hand, local Tracer
s do carry a value, but they yield a sparsity pattern that is only valid at the current point. In most cases that's not what we want, but if you can prove (through insider knowledge of your function) that this local pattern is in fact valid globally, then you can go ahead and use that.
Does this make things clearer?
Dual numbers typically pass through this seamlessly using mere overloading
Well that's not entirely true: OrdinaryDiffEq and plenty of other packages are designed specifically to work with ForwardDiff. You can deduce it from the number of occurrences of
ForwardDiff.something
in the code base, or the existence of auxiliary packages like PreallocationTools which are mentioned in the FAQ. So there are indeed places in which overloading alone does not suffice. In such places, people have usually gone out of their way to supportDual
numbers, but less so for our very recentTracer
s.Worth looking at it
We are looking at it, but some of the limitations I mentioned are intrinsic to sparsity detection. We want to deduce a sparsity pattern that is "global", meaning "independent of the point at which you evaluate the function". From there, it follows that a
Tracer
number should not carry a specific value. And in turn this means comparisons likex < y
or checks likeisnan(x)
are likely to error. There are sometimes workarounds, like usingifelse(x < y, a, b)
instead ofif x < y; a; else; b; end
(because this is more compatible with operator overloading), and we could probably defineisnan(::Tracer) = false
without the sky falling down on us. But fundamentally, globalTracer
s will always be limited in what they can do because they represent a function over its whole input domain. On the other hand, localTracer
s do carry a value, but they yield a sparsity pattern that is only valid at the current point. In most cases that's not what we want, but if you can prove (through insider knowledge of your function) that this local pattern is in fact valid globally, then you can go ahead and use that. Does this make things clearer?
@gdalle Crystal clear: thanks for the input 🙏🏽. Naive point: If a workaround like isnan(::Tracer) = false
allows overloading at the price of missing some sparsity features, that's probably acceptable.
Was not aware of OrdinaryDiffEq (and others) having tailored (thus not extendible) features specific to ForwardDiff. Another incentive for a higher level / uniform AD backend 🤞🏾.
I'll let @adrhill weigh in on isnan
and co, but my gut says that even if we define isnan(::Tracer)
, we'll eventually run into the same error for every one of the four scenarios above. This error happens in the choice of the algorithm, when the specified tolerance is compared with a fixed threshold:
https://github.com/SciML/OrdinaryDiffEq.jl/blob/6774e5d159af328b2f52e1e5ee64b09dd03229d7/lib/OrdinaryDiffEqDefault/src/default_alg.jl#L57-L64
I think that at some point, the user-provided tolerance gets converted to a Tracer
through promotion, and that's why the comparison fails. That's hard to get around, but maybe if you specify the solving algorithm yourself, you can avoid this comparison?
Of course I suspect solving this bug will only lead us to another place further down the road where comparisons happen. If that's the case, maybe local sparsity detection is a solution.
Regardless of everything I have said above, if you don't like the new system for sparsity detection, you can always use the old one! Alexis and I have made this whole thing parametrizable, so just use Symbolics.SymbolicsSparsityDetector()
instead of SparseConnectivityTracer.TracerSparsityDetector()
in your sparse model creation, and you'll be back to the old ways. 100x slower, sure, but with a slightly different set of abilities that might suit you better.
To add some more information, the original idea of the backend = :generic
is to provide backend that are not type-dependent, and therefore it skips most of the computations done at the model instantiation.
@abavoil I am not 100% sure to understand the use case here. Do you solve this problem with Hessian evaluation?
If you use no Jacobian and Hessian matrices, because either you rely on matrix-vector product only or you just skip them, there is also the keyword matrix_free = true
. If you do use Hessian evaluation, then computing the dense Hessian will be problematic for large problems.
@tmigot Is the option matrix_free = true
documented?
To add some more information, the original idea of the
backend = :generic
is to provide backend that are not type-dependent, and therefore it skips most of the computations done at the model instantiation.@abavoil I am not 100% sure to understand the use case here. Do you solve this problem with Hessian evaluation?
If you use no Jacobian and Hessian matrices, because either you rely on matrix-vector product only or you just skip them, there is also the keyword
matrix_free = true
. If you do use Hessian evaluation, then computing the dense Hessian will be problematic for large problems.
Hi @tmigot, thanks for the feedback on this option. Regarding the problem solved by @abavoil :
@abavoil any further comments?
To complement what @jbcaillau said:
I use automatic differentiation (ForwardDiff.jl
for now) in the evolution function of an ODE (solved with OrdinaryDiffEq.jl
), from which the non-linear constraints of an NLP are computed. To solve this NLP, Ipopt (NLPModelsIpopt.jl
) differentiates its constraints, applying another layer of automatic differentiation.
Between ForwardDiff.jl
and OrdinaryDiffEq.jl
, I can't expect much from custom number types that infer the structure of my problem. For now, I rely solely on ForwardDiff.jl
because it handles multiple layers of differentiation reliably. I hope it clarifies my use case 😄
Ok, thank you for the additional informations
SparseConnectivityTracer implements two tracer types:
AbstractTracer
Dual
type that wraps a primal value and an AbstractTracer
These have two different applications:
AbstractTracer
(used via TracerSparsityDetector
)
f(x)
over the entire input domain x
ifelse
, but much much more), since they have to compute a conservative global sparsity patternDual
(used via TracerLocalSparsityDetector
)
f(x)
at a specific input x
x
Dual numbers typically pass through this seamlessly using mere overloading, which does not seem to be the case for your specific type.
Using TracerLocalSparsityDetector
, our Dual
number type seamlessly passes though isnan
using mere overloading:
https://github.com/adrhill/SparseConnectivityTracer.jl/blob/ac94586c854726370a0e464ee62074774295a929/src/overloads/dual.jl#L9
However, if we don't have a primal value, as is the case with the non-dual TracerSparsityDetector
, we can't return either true
or false
–we don't know whether the primal is NaN
or not.
This is by design: if we made isnan
return true
or false
, we would enter branches of an if-else-block and not return the global pattern anymore. This is the case for all is*
functions, e.g. iseven
, iszero
, etc.
The exception where non-dual tracers support some control-flow is the aforementioned ifelse
function. There, we trace through both expressions and compute a union of both output patterns.
TLDR:
TracerSparsityDetector
) is that we can't guarantee a correct, conservative sparsity pattern if your function has control flow. We therefore prefer to throw an error rather than to return a wrong result.TracerLocalSparsityDetector
, which supports control flow via a Dual
number type.
Hello @amontoison @gdalle @tmigot,
First, thank you for your work on this project! I recently updated to v0.8.3 and encountered an issue with my code. I use OrdinaryDiffEq to solve an ODE within the constraint function of my problem. Previously, before v0.8, no sparsity analysis was done by default, and everything worked smoothly. However, since v0.8, the default sparsity analysis seems to cause OrdinaryDiffEq's solve function to throw an error. This behaviour seems to cause #247 as well.
Here is a comparison of my code before and after the update:
Before:
After:
I understand that enabling sparsity by default can improve performance in many cases, but it seems to cause issues in this particular scenario. Would it be possible to provide an option to disable sparsity analysis, or is there another recommended approach to maintain compatibility?
Thank you for your time and assistance!
ping @jbcaillau