JuliaNLSolvers / Optim.jl

Optimization functions for Julia
Other
1.13k stars 218 forks source link

Multidimensional arrays #399

Closed pkofod closed 4 years ago

pkofod commented 7 years ago

Fix up (L-)BFGS NelderMead, and PSO code such that we can pass multidimensional arrays as input.

cortner commented 7 years ago

How about making the input an abstract type? At the end of the day the state of the optimisation just needs to be an element of a hilbert space

cortner commented 7 years ago

Also see @ChrisRackauckas comments in #326. I really see no barrier to admitting a general abstract type as the state of optimisation - only advantages.

pkofod commented 7 years ago

How abstract? AbstractArray?

cortner commented 7 years ago

Not even an AbstractArray. You actually need elements of a Hilbert space over some type T, i.e., you need x + y, t * x, dot(x, y) and I think that is it already. (I'm not even sure it needs to be a real Hilbert space?)

This is really not a necessity, BUT: it might actually make the code more readable, and it could have some unexpected applications. E.g., if I minimise over a space of functions, then I don't have to represent functions in subsequent optimisation steps w.r.t. the same basis.

ChrisRackauckas commented 7 years ago

This is really not a necessity, BUT: it might actually make the code more readable, and it could have some unexpected applications. E.g., if I minimise over a space of functions, then I don't have to represent functions in subsequent optimisation steps w.r.t. the same basis.

Yes, ApproxFun would be a very interesting application which takes the generality to the max here.

Not even an AbstractArray. You actually need elements of a Hilbert space, i.e., you need x + y, t * x, dot(x, y) and I think that is it already.

From BFGS, it looks like you need A_mul_B!, scale!, indexing, *, +, and vecdot. So if the indexing operation was replaced with a broadcast, then this all boils down to *, +, vecdot, and broadcast overloads for the type (guaranteed by indexing, but necessarily needing indexing). I believe Fun types may already satisfy this.

cortner commented 7 years ago

everything in BFGS-type algorithms can in principle be reduced to the operations I mentioned - even if the current implementation possibly doesn't. The problem here is that that LBFGS stores the outer product x * y' instead of the operator it defines

ChrisRackauckas commented 7 years ago

everything in BFGS-type algorithms can in principle be reduced to the operations I mentioned - even if the current implementation possibly doesn't.

Mathematically yes. But in reality, what I was trying to point out is that you're missing that either indexing or broadcast needs to be defined. Right now that's using indexing, hence an AbstractArray subtype. But that definitely has an easy change to broadcasting, in which case an AbstractArray isn't needed. Other than that we agree, since A_mul_B! and vecdot have generic fallbacks that use *, +, and hopefully broadcast.

cortner commented 7 years ago

and in addition CG of L-BFGS or a hessian-free TR or the various accelerated steepest descent methods really just need the small set I mentioned.

pkofod commented 7 years ago

Okay, I'm sold. If we need to do something that sacrifices performance for arrays, we can just have tightly and loosely typed methods

cortner commented 7 years ago

is there a performance benchmark suite? I'd be quite interested to do a prototype (say, CG or LBFGS) and confirm that the strict typing we use at the moment is not needed at all.

pkofod commented 7 years ago

is there a performance benchmark suite? I'd be quite interested to do a prototype (say, CG or LBFGS) and confirm that the strict typing we use at the moment is not needed at all.

Not tagged, but the code can be found at OptimTests.jl ..I should really get that fixed up and merged (will do after Optim is tagged)

ChrisRackauckas commented 7 years ago

I would be heavily surprised if, when you did this, the machine code changed one bit. It would only change if you change the indexing to broadcasting, in which case it's already pretty well documented what the (current) cost is there.

cortner commented 7 years ago

I agree, but there is lots of stuff like this in the code:

        @simd for i in 1:n
            @inbounds state.y[i] = gradient(d, i) - state.g_previous[i]
        end

which I think is appalling, so this will also be a good opportunity to get rid of it.

pkofod commented 7 years ago

which I think is appalling, so this will also be a good opportunity to get rid of it.

Agree, and I the time to calculate y .= gradient(d, i) .- state.g_previous would never dominate ldivs or matrix multiplications anyway.

schlichtanders commented 7 years ago

Stumbled upon the following (is this related?)

when trying to optimize a sum over multidimensional Rosenbrock function

f(x) = sum((1.0 .- x[1]).^2 .+ 100.0 .* (x[2] .- x[1].^2).^2)

I get the following error

optimize(f, [[12.0,1.0],[1.0,2.0]])
ERROR: MethodError: no method matching zero(::Type{Array{Float64,1}})
Closest candidates are:
  zero(::Type{Base.LibGit2.Oid}) at libgit2\oid.jl:88
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:80
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:120
  ...
 in initial_state(::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Void}, ::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\nelder_mead.jl:132
 in optimize(::Function, ::Array{Array{Float64,1},1}, ::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Void}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:172
 in #optimize#74(::Array{Any,1}, ::Function, ::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:24
 in optimize(::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:23

My final goal is to regard the subcomponents x[1], x[2], ... as completely independent, i.e. for instance x::Array{Array{T,1},1}, is this possible with Optim.jl in general? And is this possible with autodiff in particular?

thanks a lot

pkofod commented 7 years ago

I don't quite get what you're trying to do. For example, do you realize that your provided point cannot be evaluated by f?

julia> f( [[12.0,1.0],[1.0,2.0]])
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
 [1] power_by_squaring(::Array{Float64,1}, ::Int64) at ./intfuncs.jl:166
 [2] f(::Array{Array{Float64,1},1}) at ./REPL[161]:1
ChrisRackauckas commented 7 years ago

Supporting arrays of arrays is a whole different ballgame. You need to make sure most internal calls will automatically be recursive. The repo

https://github.com/JuliaDiffEq/RecursiveArrayTools.jl

is the tools that are used in DiffEq to support this kind of stuff, but there's a substantial amount of "being careful with types" that has to be done to get recursive arrays working.

pkofod commented 7 years ago

You need to make sure most internal calls will automatically be recursive.

You also need to make sure you agree what it's supposed to mean... I mean, in the example above I'm a bit confused.

schlichtanders commented 7 years ago

@pkofod I am sorry for the confusing, it should be f(x) = sum((1.0 .- x[1]).^2 .+ 100.0 .* (x[2] .- x[1].^2).^2) and I now also changed it above. The error stays the same.

I would like to have an argument of type x::Array{Array{T,1},1} supported, because when using the alternative of a single flat vector, I would need to know the shapes of my single arguments in advance. If I define a custom function, I can have multiple parameters and auto-differentiate the complete function with respect to all parameters. Such multiple independend parameters are kind of what I wanted to regain with x::Array{Array{T,1},1}. Such a setting allows for easily changing sizes of parameters during iteration for instance.

It would be really helpful if I can have something like multiple parameters (i.e. arguments) to my optimizable function. Is this still this issue, or rather something else? Looking forward to your thoughts.

timholy commented 7 years ago

If each array in the "outer" container is of the same size (as in this example), you almost surely want to use StaticArrays and make it an Vector{SVector{L,T}}. At least the call to zero will succeed if you do that; you might have to make some additional changes, though.

pkofod commented 7 years ago

It would be really helpful if I can have something like multiple parameters (i.e. arguments) to my optimizable function. Is this still this issue, or rather something else? Looking forward to your thoughts.

I think it is this issue, yes.

cortner commented 7 years ago

CC @ettersi

pkofod commented 7 years ago

Does @ettersi have a use case here?

ChrisRackauckas commented 7 years ago

I see I never answered about the recursive arrays. Let me be a little more clear about that.

If you do something like a .= b with arrays of arrays, then it's like a[i] = b[i] and thus the elements of the two arrays of arrays will now share references. The same happens with copy!. So you have to be very careful here. recursivecopy! from RecursiveArrayTools.jl and the VectorOfArrays broadcast implementation take care of these kinds of issues. What @timholy suggests is another way of handling it since SArrays are immutable and thus you don't need recursion because a[i] = b[i] will set the value of a[i] to the static array value of b[i] and not copy a reference. So in short, if everything is generic enough, Vector{SVector{L,T}} should already work without any changes, while Vector{Vector{T}} would need a few extra changes but the tools to do this are already created an well-tested.

I'm not sure if the full mutability makes too much sense for an optimization problem, diffeq does it because things "change over time" but I'm not sure that has much meaning here. But it would be something nice to handle correctly down the road.

As for other array types, things like GPUArrays (and support for internal multithreading via JLArrays) comes free if you A_mul_B! and change your loops to broadcasting as above. In DiffEq we found this has essentially no cost as long as you avoid https://github.com/JuliaLang/julia/issues/22255 , so I would recommend just changing every element-wise loop whenever you have time. Looking at the code, it shouldn't be too difficult to get this all working with GPUArrays, which is nice because it's different than what other packages have. The "marketing" is like this. Most optimization packages allow you to "use" GPUs because you can GPU the call to the objective function. However, when setting this up the user has to always write a step that updates the parameters on the GPU given the update from the optimizer, i.e. there is a GPU load/unload required. But if you setup everything to work with GPUArrays, then internally everything important from the optimizer can stay on the GPU, leading to an optimizer which compiles itself to be "copy-free" and require no extra transferring arrays to/from the GPU. I think this is a huge win and something that can be really hammered home in a blog post, and I could see that getting a lot of interest.

One thing that is missing from the last point though is that GPUArrays don't fully support matrix factorizations and \ yet, so you may want to do all of these changes but then only test it out for now on gradient decent.

I can start over the weekend helping get this all setup. Maybe if you guys are also setup with GPUArrays my "please help me get factorizations working with GPUArrays!" call will be stronger.

ettersi commented 7 years ago

My use-case is optimising an (approximate) lattice of atoms, which I store as a Matrix{SVector{2,Float64}}. Currently, when I call

optimize(
         total_energyy, 
         forces!, 
         y, 
         ConjugateGradient()
        )

I get the following error:

ERROR: AssertionError: typeof(f_x) == T
 in initial_state(::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}, ::Optim.Options{Void}, ::Optim.OnceDifferentiable, ::Array{StaticArrays.SVector{2,Float64},2}) at /home/simon/.julia/v0.5/Optim/src/cg.jl:105
 in optimize(::Optim.OnceDifferentiable, ::Array{StaticArrays.SVector{2,Float64},2}, ::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}, ::Optim.Options{Void}) at /home/simon/.julia/v0.5/Optim/src/optimize.jl:172
 in optimize(::T.##6#9{CartesianIndex{2}}, ::T.##7#10{CartesianIndex{2}}, ::Array{StaticArrays.SVector{2,Float64},2}, ::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}) at /home/simon/.julia/v0.5/Optim/src/optimize.jl:58

I haven't really dug into the details of this error yet, though. It could well be that there is some subtle issue in my code, or that it's an issue that I am still on 0.5.

pkofod commented 7 years ago

Being on v05 will at least mean that you're not getting the latest updates, but supporting static arrray types are high on my priority list

ChrisRackauckas commented 7 years ago

Vectors of Static Arrays will have other issues though, since indexing gets you one static array each time, so defining a Jacobian and factorizing it won't work very well making it work with some algorithms but not too many. You'll want to wrap the Vector{SArray} in something like RecursiveArrayTools' VectorOfArray to really fix that (because then it will have proper indexing, so you can vec to get a view of the whole thing as a Vector and treat it likewise), unless Optim does a bunch of special handling.

ettersi commented 7 years ago

Solved the problem using reinterpret. Works kind of all right, although a solution without this extra typing would of course be nice. But I understand that this is open source code and if I want a feature I would have to do it myself.

pkofod commented 7 years ago

You don't necessarily have to do it yourself. A precise description of what you want goes a long way :)

pkofod commented 4 years ago

Fixed the original issue, if anything we can do comes up we can open a new issue.