Closed tmigot closed 7 months ago
Oh sorry, I fixed my suggestion. I liked ii) better as it allows to have a model and still evaluate in another precision.
Is there a particular reason @amontoison ?
I feel it should really be
grad(nlp::AbstractNLPModel{T, S}, x::S) where {T, S} # -> return a value of type S
and there should be conversion methods if the type of x
is not S
.
EDIT: No, nevermind. Now that I think about it, we allowed x::V
to allow views, among others, to work efficiently. However, if the user calls grad()
and not grad!()
, we have to decide what the return type should be. It should be S
because x
could be a view, and it doesn’t make sense to return a new view.
The views are a good point. The current implementation of grad
does this:
function grad(nlp::AbstractNLPModel, x::AbstractVector)
@lencheck nlp.meta.nvar x
g = similar(x)
return grad!(nlp, x, g)
end
to keep the type of x
and avoid the potential issue with views
. This is actually a third option, iii).
However, I suspect the compiler will prefer what you both suggested
function grad(nlp::AbstractNLPModel{T, S}, x::AbstractVector) where {T, S}
@lencheck nlp.meta.nvar x
g = S(undef, nlp.meta.nvar)
return grad!(nlp, x, g)
end
What makes you say that the compiler would prefer S
as return type?
Just to think things through, let’s say S
is a CUDA array and consider the following cases:
x
is a view into an object of type S
: then I presume similar(x)
will return another object of type S
, and that is good.x
is a Vector{Float64}
(for some reason). What should we return then and why?What makes you say that the compiler would prefer
S
as return type?From https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing I think using
similar(x)
(iii) will no specialize, while usingS
(ii) does. This could probably be tested though.
I think in general chosing S
as the return type sort of "fix" things, so it can be known in advance what is the return type, while relying on x
is more flexible but also less predictable in a sense.
This example sort of illustrates it. Tthe btime's are exactly the same, it is just compilation that changes.
using BenchmarkTools, NLPModels, NLPModelsTest
function grad1(nlp::AbstractNLPModel, x::AbstractVector)
@lencheck nlp.meta.nvar x
g = similar(x)
return grad!(nlp, x, g)
end
function grad2(nlp::AbstractNLPModel{T, S}, x::AbstractVector) where {T, S}
@lencheck nlp.meta.nvar x
g = S(undef, nlp.meta.nvar)
return grad!(nlp, x, g)
end
nlp = BROWNDEN()
g = similar(nlp.meta.x0)
xview = view(fill!(Vector{Float64}(undef, 5), 1), 1:4)
grad!(nlp, nlp.meta.x0, g) # pre-compile grad!
grad!(nlp, xview, g) # pre-compile grad!
@time grad1(nlp, nlp.meta.x0) # 0.009147 seconds (1.99 k allocations: 136.719 KiB, 99.43% compilation time)
@btime grad1(nlp, nlp.meta.x0)
@time grad1(nlp, xview) # 0.012386 seconds (5.15 k allocations: 346.609 KiB, 99.33% compilation time)
@time grad2(nlp, nlp.meta.x0) # 0.007110 seconds (2.25 k allocations: 155.812 KiB, 99.45% compilation time)
@btime grad2(nlp, nlp.meta.x0)
@time grad2(nlp, xview) # 0.006264 seconds (2.69 k allocations: 177.188 KiB, 99.41% compilation time)
I think the conclusion, I am trying to reach is grad2
is more "robust" to unexpected types of x
.
Beyond that, I think it's a choice. The following table is testing nlp = BROWNDEN(S)
and x::V
with the function grad1
and grad2
.
S | V | grad1(similar) | grad2(S) |
---|---|---|---|
Vector{Float64} | Vector{Float32} | Vector{Float32} | Vector{Float64} |
Vector{Float32} | Vector{Float64} | Vector{Float64} | Vector{Float32} |
CuArray{Float64} | Vector{Float64} | Vector{Float64} | CuArray{Float64} |
Vector{Float64} | CuArray{Float64} | CuArray{Float64} | Vector{Float64} |
but it's definitely worth having this discussion and documenting the solution.
This seems related https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/issues/135 @paraynaud (in case you have some feedback too)
From what I read, the option iii) should satisfy everyone.
Additionally, there is no risk for you to interfere with the partially-separable models while revamping grad
as PartiallySeparableNLPModels.jl defines its own grad
and grad!
methods.
The PR https://github.com/JuliaSmoothOptimizers/NLPModels.jl/pull/455 applies what we are discussing here.
I think we should clarify the returned type in the API.
Essentially, when calling
grad(nlp::AbstractNLPModel{T, S}, x::V) where {T, S, V}
we have two options:In general, I think assuming S == V is too restrictive. Personally, I prefer option ii) as it is easier to implement multi-precision.
@amontoison @abelsiqueira @dpo What do you think? i) or ii) or ... ?
Illustration
There are some incoherences right now, for instance, the functions
grad(nlp, x)
orresidual(nls,x)
do not have the same behavior.This is connected to https://github.com/JuliaSmoothOptimizers/NLPModelsTest.jl/pull/105