Closed pkofod closed 4 years ago
How about making the input an abstract type? At the end of the day the state of the optimisation just needs to be an element of a hilbert space
Also see @ChrisRackauckas comments in #326. I really see no barrier to admitting a general abstract type as the state of optimisation - only advantages.
How abstract? AbstractArray
?
Not even an AbstractArray. You actually need elements of a Hilbert space over some type T
, i.e., you need x + y
, t * x
, dot(x, y)
and I think that is it already. (I'm not even sure it needs to be a real Hilbert space?)
This is really not a necessity, BUT: it might actually make the code more readable, and it could have some unexpected applications. E.g., if I minimise over a space of functions, then I don't have to represent functions in subsequent optimisation steps w.r.t. the same basis.
This is really not a necessity, BUT: it might actually make the code more readable, and it could have some unexpected applications. E.g., if I minimise over a space of functions, then I don't have to represent functions in subsequent optimisation steps w.r.t. the same basis.
Yes, ApproxFun would be a very interesting application which takes the generality to the max here.
Not even an AbstractArray. You actually need elements of a Hilbert space, i.e., you need x + y, t * x, dot(x, y) and I think that is it already.
From BFGS, it looks like you need A_mul_B!
, scale!
, indexing, *
, +
, and vecdot
. So if the indexing operation was replaced with a broadcast, then this all boils down to *
, +
, vecdot
, and broadcast overloads for the type (guaranteed by indexing, but necessarily needing indexing). I believe Fun
types may already satisfy this.
everything in BFGS
-type algorithms can in principle be reduced to the operations I mentioned - even if the current implementation possibly doesn't. The problem here is that that LBFGS stores the outer product x * y'
instead of the operator it defines
everything in BFGS-type algorithms can in principle be reduced to the operations I mentioned - even if the current implementation possibly doesn't.
Mathematically yes. But in reality, what I was trying to point out is that you're missing that either indexing or broadcast needs to be defined. Right now that's using indexing, hence an AbstractArray
subtype. But that definitely has an easy change to broadcasting, in which case an AbstractArray
isn't needed. Other than that we agree, since A_mul_B!
and vecdot
have generic fallbacks that use *
, +
, and hopefully broadcast.
and in addition CG of L-BFGS or a hessian-free TR or the various accelerated steepest descent methods really just need the small set I mentioned.
Okay, I'm sold. If we need to do something that sacrifices performance for arrays, we can just have tightly and loosely typed methods
is there a performance benchmark suite? I'd be quite interested to do a prototype (say, CG or LBFGS) and confirm that the strict typing we use at the moment is not needed at all.
is there a performance benchmark suite? I'd be quite interested to do a prototype (say, CG or LBFGS) and confirm that the strict typing we use at the moment is not needed at all.
Not tagged, but the code can be found at OptimTests.jl ..I should really get that fixed up and merged (will do after Optim is tagged)
I would be heavily surprised if, when you did this, the machine code changed one bit. It would only change if you change the indexing to broadcasting, in which case it's already pretty well documented what the (current) cost is there.
I agree, but there is lots of stuff like this in the code:
@simd for i in 1:n
@inbounds state.y[i] = gradient(d, i) - state.g_previous[i]
end
which I think is appalling, so this will also be a good opportunity to get rid of it.
which I think is appalling, so this will also be a good opportunity to get rid of it.
Agree, and I the time to calculate y .= gradient(d, i) .- state.g_previous
would never dominate ldivs or matrix multiplications anyway.
Stumbled upon the following (is this related?)
when trying to optimize a sum over multidimensional Rosenbrock function
f(x) = sum((1.0 .- x[1]).^2 .+ 100.0 .* (x[2] .- x[1].^2).^2)
I get the following error
optimize(f, [[12.0,1.0],[1.0,2.0]])
ERROR: MethodError: no method matching zero(::Type{Array{Float64,1}})
Closest candidates are:
zero(::Type{Base.LibGit2.Oid}) at libgit2\oid.jl:88
zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:80
zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:120
...
in initial_state(::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Void}, ::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\nelder_mead.jl:132
in optimize(::Function, ::Array{Array{Float64,1},1}, ::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Void}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:172
in #optimize#74(::Array{Any,1}, ::Function, ::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:24
in optimize(::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:23
My final goal is to regard the subcomponents x[1], x[2], ...
as completely independent, i.e. for instance x::Array{Array{T,1},1}
, is this possible with Optim.jl in general?
And is this possible with autodiff
in particular?
thanks a lot
I don't quite get what you're trying to do. For example, do you realize that your provided point cannot be evaluated by f
?
julia> f( [[12.0,1.0],[1.0,2.0]])
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
[1] power_by_squaring(::Array{Float64,1}, ::Int64) at ./intfuncs.jl:166
[2] f(::Array{Array{Float64,1},1}) at ./REPL[161]:1
Supporting arrays of arrays is a whole different ballgame. You need to make sure most internal calls will automatically be recursive. The repo
https://github.com/JuliaDiffEq/RecursiveArrayTools.jl
is the tools that are used in DiffEq to support this kind of stuff, but there's a substantial amount of "being careful with types" that has to be done to get recursive arrays working.
You need to make sure most internal calls will automatically be recursive.
You also need to make sure you agree what it's supposed to mean... I mean, in the example above I'm a bit confused.
@pkofod I am sorry for the confusing, it should be
f(x) = sum((1.0 .- x[1]).^2 .+ 100.0 .* (x[2] .- x[1].^2).^2)
and I now also changed it above.
The error stays the same.
I would like to have an argument of type x::Array{Array{T,1},1}
supported, because when using the alternative of a single flat vector, I would need to know the shapes of my single arguments in advance. If I define a custom function, I can have multiple parameters and auto-differentiate the complete function with respect to all parameters. Such multiple independend parameters are kind of what I wanted to regain with x::Array{Array{T,1},1}
.
Such a setting allows for easily changing sizes of parameters during iteration for instance.
It would be really helpful if I can have something like multiple parameters (i.e. arguments) to my optimizable function. Is this still this issue, or rather something else? Looking forward to your thoughts.
If each array in the "outer" container is of the same size (as in this example), you almost surely want to use StaticArrays and make it an Vector{SVector{L,T}}
. At least the call to zero
will succeed if you do that; you might have to make some additional changes, though.
It would be really helpful if I can have something like multiple parameters (i.e. arguments) to my optimizable function. Is this still this issue, or rather something else? Looking forward to your thoughts.
I think it is this issue, yes.
CC @ettersi
Does @ettersi have a use case here?
I see I never answered about the recursive arrays. Let me be a little more clear about that.
If you do something like a .= b
with arrays of arrays, then it's like a[i] = b[i]
and thus the elements of the two arrays of arrays will now share references. The same happens with copy!
. So you have to be very careful here. recursivecopy!
from RecursiveArrayTools.jl and the VectorOfArrays
broadcast implementation take care of these kinds of issues. What @timholy suggests is another way of handling it since SArray
s are immutable and thus you don't need recursion because a[i] = b[i]
will set the value of a[i]
to the static array value of b[i]
and not copy a reference. So in short, if everything is generic enough, Vector{SVector{L,T}}
should already work without any changes, while Vector{Vector{T}}
would need a few extra changes but the tools to do this are already created an well-tested.
I'm not sure if the full mutability makes too much sense for an optimization problem, diffeq does it because things "change over time" but I'm not sure that has much meaning here. But it would be something nice to handle correctly down the road.
As for other array types, things like GPUArray
s (and support for internal multithreading via JLArray
s) comes free if you A_mul_B!
and change your loops to broadcasting as above. In DiffEq we found this has essentially no cost as long as you avoid https://github.com/JuliaLang/julia/issues/22255 , so I would recommend just changing every element-wise loop whenever you have time. Looking at the code, it shouldn't be too difficult to get this all working with GPUArrays, which is nice because it's different than what other packages have. The "marketing" is like this. Most optimization packages allow you to "use" GPUs because you can GPU the call to the objective function. However, when setting this up the user has to always write a step that updates the parameters on the GPU given the update from the optimizer, i.e. there is a GPU load/unload required. But if you setup everything to work with GPUArrays, then internally everything important from the optimizer can stay on the GPU, leading to an optimizer which compiles itself to be "copy-free" and require no extra transferring arrays to/from the GPU. I think this is a huge win and something that can be really hammered home in a blog post, and I could see that getting a lot of interest.
One thing that is missing from the last point though is that GPUArray
s don't fully support matrix factorizations and \
yet, so you may want to do all of these changes but then only test it out for now on gradient decent.
I can start over the weekend helping get this all setup. Maybe if you guys are also setup with GPUArray
s my "please help me get factorizations working with GPUArray
s!" call will be stronger.
My use-case is optimising an (approximate) lattice of atoms, which I store as a Matrix{SVector{2,Float64}}
. Currently, when I call
optimize(
total_energyy,
forces!,
y,
ConjugateGradient()
)
I get the following error:
ERROR: AssertionError: typeof(f_x) == T
in initial_state(::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}, ::Optim.Options{Void}, ::Optim.OnceDifferentiable, ::Array{StaticArrays.SVector{2,Float64},2}) at /home/simon/.julia/v0.5/Optim/src/cg.jl:105
in optimize(::Optim.OnceDifferentiable, ::Array{StaticArrays.SVector{2,Float64},2}, ::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}, ::Optim.Options{Void}) at /home/simon/.julia/v0.5/Optim/src/optimize.jl:172
in optimize(::T.##6#9{CartesianIndex{2}}, ::T.##7#10{CartesianIndex{2}}, ::Array{StaticArrays.SVector{2,Float64},2}, ::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}) at /home/simon/.julia/v0.5/Optim/src/optimize.jl:58
I haven't really dug into the details of this error yet, though. It could well be that there is some subtle issue in my code, or that it's an issue that I am still on 0.5.
Being on v05 will at least mean that you're not getting the latest updates, but supporting static arrray types are high on my priority list
Vectors of Static Arrays will have other issues though, since indexing gets you one static array each time, so defining a Jacobian and factorizing it won't work very well making it work with some algorithms but not too many. You'll want to wrap the Vector{SArray} in something like RecursiveArrayTools' VectorOfArray
to really fix that (because then it will have proper indexing, so you can vec
to get a view of the whole thing as a Vector and treat it likewise), unless Optim does a bunch of special handling.
Solved the problem using reinterpret
. Works kind of all right, although a solution without this extra typing would of course be nice. But I understand that this is open source code and if I want a feature I would have to do it myself.
You don't necessarily have to do it yourself. A precise description of what you want goes a long way :)
Fixed the original issue, if anything we can do comes up we can open a new issue.
Fix up (L-)BFGS NelderMead, and PSO code such that we can pass multidimensional arrays as input.