JuliaSmoothOptimizers / NLPModels.jl

Data Structures for Optimization Models
Other
175 stars 35 forks source link

Uniform returned type in the NLPModel API #454

Closed tmigot closed 7 months ago

tmigot commented 8 months ago

I think we should clarify the returned type in the API.

Essentially, when calling grad(nlp::AbstractNLPModel{T, S}, x::V) where {T, S, V} we have two options:

In general, I think assuming S == V is too restrictive. Personally, I prefer option ii) as it is easier to implement multi-precision.

@amontoison @abelsiqueira @dpo What do you think? i) or ii) or ... ?

Illustration

There are some incoherences right now, for instance, the functions grad(nlp, x) or residual(nls,x) do not have the same behavior.

using CUDA, NLPModels, NLPModelsTest # https://github.com/JuliaSmoothOptimizers/NLPModelsTest.jl#use-gpu

nls = LLS(CuArray{Float32})
x32 = nls.meta.x0
CUDA.allowscalar() do 
  residual(nls, x32)
end
x64 = fill!(CuArray{Float64}(undef, 2), 0)
CUDA.allowscalar() do 
  residual(nls, x64) # ERROR: MethodError: no method matching residual(::LLS{Float32, CuArray{Float32}}, ::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer})
end

CUDA.allowscalar() do 
  grad(nls, x64) # Return a CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}
end

This is connected to https://github.com/JuliaSmoothOptimizers/NLPModelsTest.jl/pull/105

amontoison commented 8 months ago
tmigot commented 8 months ago

Oh sorry, I fixed my suggestion. I liked ii) better as it allows to have a model and still evaluate in another precision.

Is there a particular reason @amontoison ?

dpo commented 8 months ago

I feel it should really be

grad(nlp::AbstractNLPModel{T, S}, x::S) where {T, S}  # -> return a value of type S

and there should be conversion methods if the type of x is not S.

EDIT: No, nevermind. Now that I think about it, we allowed x::V to allow views, among others, to work efficiently. However, if the user calls grad() and not grad!(), we have to decide what the return type should be. It should be S because x could be a view, and it doesn’t make sense to return a new view.

tmigot commented 8 months ago

The views are a good point. The current implementation of grad does this:

function grad(nlp::AbstractNLPModel, x::AbstractVector)
  @lencheck nlp.meta.nvar x
  g = similar(x)
  return grad!(nlp, x, g)
end

to keep the type of x and avoid the potential issue with views. This is actually a third option, iii).

However, I suspect the compiler will prefer what you both suggested

function grad(nlp::AbstractNLPModel{T, S}, x::AbstractVector) where {T, S}
  @lencheck nlp.meta.nvar x
  g = S(undef, nlp.meta.nvar)
  return grad!(nlp, x, g)
end
dpo commented 8 months ago

What makes you say that the compiler would prefer S as return type?

Just to think things through, let’s say S is a CUDA array and consider the following cases:

  1. x is a view into an object of type S: then I presume similar(x) will return another object of type S, and that is good.
  2. x is a Vector{Float64} (for some reason). What should we return then and why?
tmigot commented 8 months ago

What makes you say that the compiler would prefer S as return type?

From https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing I think using similar(x) (iii) will no specialize, while using S (ii) does. This could probably be tested though.


I think in general chosing S as the return type sort of "fix" things, so it can be known in advance what is the return type, while relying on x is more flexible but also less predictable in a sense.

This example sort of illustrates it. Tthe btime's are exactly the same, it is just compilation that changes.

using BenchmarkTools, NLPModels, NLPModelsTest

function grad1(nlp::AbstractNLPModel, x::AbstractVector)
    @lencheck nlp.meta.nvar x
    g = similar(x)
    return grad!(nlp, x, g)
end

function grad2(nlp::AbstractNLPModel{T, S}, x::AbstractVector) where {T, S}
    @lencheck nlp.meta.nvar x
    g = S(undef, nlp.meta.nvar)
    return grad!(nlp, x, g)
end

nlp = BROWNDEN()
g = similar(nlp.meta.x0)
xview = view(fill!(Vector{Float64}(undef, 5), 1), 1:4)
grad!(nlp, nlp.meta.x0, g) # pre-compile grad!
grad!(nlp, xview, g) # pre-compile grad!

@time grad1(nlp, nlp.meta.x0) # 0.009147 seconds (1.99 k allocations: 136.719 KiB, 99.43% compilation time)
@btime grad1(nlp, nlp.meta.x0)
@time grad1(nlp, xview) # 0.012386 seconds (5.15 k allocations: 346.609 KiB, 99.33% compilation time)
@time grad2(nlp, nlp.meta.x0) # 0.007110 seconds (2.25 k allocations: 155.812 KiB, 99.45% compilation time)
@btime grad2(nlp, nlp.meta.x0)
@time grad2(nlp, xview) # 0.006264 seconds (2.69 k allocations: 177.188 KiB, 99.41% compilation time)

I think the conclusion, I am trying to reach is grad2 is more "robust" to unexpected types of x.


Beyond that, I think it's a choice. The following table is testing nlp = BROWNDEN(S) and x::V with the function grad1 and grad2.

S V grad1(similar) grad2(S)
Vector{Float64} Vector{Float32} Vector{Float32} Vector{Float64}
Vector{Float32} Vector{Float64} Vector{Float64} Vector{Float32}
CuArray{Float64} Vector{Float64} Vector{Float64} CuArray{Float64}
Vector{Float64} CuArray{Float64} CuArray{Float64} Vector{Float64}

but it's definitely worth having this discussion and documenting the solution.

tmigot commented 8 months ago

This seems related https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/issues/135 @paraynaud (in case you have some feedback too)

paraynaud commented 8 months ago

From what I read, the option iii) should satisfy everyone. Additionally, there is no risk for you to interfere with the partially-separable models while revamping grad as PartiallySeparableNLPModels.jl defines its own grad and grad! methods.

tmigot commented 8 months ago

The PR https://github.com/JuliaSmoothOptimizers/NLPModels.jl/pull/455 applies what we are discussing here.