Closed jlperla closed 6 years ago
Sorry if I missed something, but what was wrong with just using Anderson acceleration with no history?
The key is to be able to have a fixedpoint
as a function and then use different algorithms in the background, starting with just iterating on the map.
For the Newton and Anderson the interface would basically just do a little bookkeeping and then call nsolve
with the anderson or newton tags.. In the case of the netwon, it might need to fudge any user defined jacobian by subtracting a identity matrix or something... But we can just get simple iteration working first.
So you mean have something like fixedpoint(f) = nlsolve(x -> f(x) - x)
(but with all the bells of whistles of inplace/out of place, jacobians, etc.)? Sure, that makes sense, but there's no need to reimplement any algorithm, right?
@ArnavSood I think the basic approach is solid. A few other things:
if
logic in the function. This is how it will end up being hooked into the function (see the inplace=true
in https://github.com/JuliaNLSolvers/NLsolve.jl/blob/master/src/nlsolve/nlsolve.jl#L49)Part of the testing should be to try it with different types. A few comments:
SVector
or an array of length = 1. nlsolve
is designed to work with vectors, not univariate functions.function f!(x, fx)
fx[:] = A * x + b
end
for the inplace, but I think it is supposed to be fx .= A * x + b
instead? You can test.
So you mean have something like fixedpoint(f) = nlsolve(x -> f(x) - x) (but with all the bells of whistles of inplace/out of place, jacobians, etc.)? Sure, that makes sense, but there's no need to reimplement any algorithm, right?
@antoine-levitt The only algorithm that is missing is the simple iteration one (which is the default fixed point algorithm in many cases). Everything else is just bells and whistles, exactly as you are saying. The simple-iteration is an essential algorithm because of convergence properties - i.e. if you can prove the mapping is a contraction, then it is basically executing Banach's fixed point theorem. For better or worse, it is usually the one economists start with - though I hope they use fancier algorithms if we can show them how to swap things out!
Again, just do anderson
with m=0
and you're set. This thread (and others) just made me afraid that a lot of code is going to be duplicated with no real motivation, complicating further developments and increasing technical debt, but as long as it's just the API and no algorithm actually gets implemented that's perfectly fine!
The simple-iteration is an essential algorithm because of convergence properties - i.e. if you can prove the mapping is a contraction, then it is basically executing Banach's fixed point theorem
Anderson acceleration actually preserves this property under relatively mild assumptions, see https://epubs.siam.org/doi/abs/10.1137/130919398
Again, just do anderson with m=0 and you're set. This thread (and others) just made me afraid that a lot of code is going to be duplicated with no real motivation, complicating further developments and increasing technical debt, but as long as it's just the API and no algorithm actually gets implemented that's perfectly fine!
We don't want to duplicate code at all, just get the API working with overhead-free performance. The only code that needs to be duplicated is the nsolve
to fixedpoint
interface.
For using Anderson with the m=0
case, the thought was to get the minimal simple-iteration code up and running, but if you think the performance will be nearly identical, then they could try to do that route instead? We can use the naive, simple fixed-point iteration in the tests for benchmark comparison, and only implement a separate algorithm if there is significant overhead?
Anderson acceleration actually preserves this property under relatively mild assumptions, see https://epubs.siam.org/doi/abs/10.1137/130919398
Thanks! One of my hopes with getting this convenience wrapper is that we can convince economists on the convergence and speed of better fixed point algorithms they can test by just swapping out an algorithm flag.
As mentioned in https://github.com/JuliaNLSolvers/NLsolve.jl/issues/152, I don't think there will be any significant performance difference. The only thing you can exploit is shortcutting x + g(x) - x
into g(x)
, which is unlikely to matter unless g
is very cheap (in which case you're probably better off with StaticArrays, and in that case the compiler might even be smart enough to compile the redundant operations away). But benchmarks would certainly be interesting!
Thanks! One of my hopes with getting this convenience wrapper is that we can convince economists on the convergence and speed of better fixed point algorithms they can test by just swapping out an algorithm flag.
Even more reason to use anderson as the fixed-point algorithm: a better algorithm is just a m=10
away!
OK, talked with @jlperla. So we will write these tests and use :simple_iteration
(or however we pass it in) as a call to Anderson with no memory.
@AliKarimirad, if you pull the git repo now (branch fixed-point-clean
, as of commit b0a1fac4fe44d510fabbf201e7fb91ac88cc341e or later), it should be a working version with one method in there for the fixed point of a function (in place or out of place), without the Jacobian, located in src/nlsolve/fixedpoint.jl
.
Here's what we need to do:
Int64
, Float64
, etc.), and for different kinds of containers (e.g., a normal Vector
vs a StaticArray
). Since Julia types are objects, we can use them in loops, as follows:@testset "PreMetric, SemiMetric, Metric on $T" for T in (Float64, Int64, BigFloat, BigInt) begin ... end
[ ] Test that the function is "failing gracefully," or that it's returning the right kinds of errors for (say) dimension mismatches, argument errors, method errors, etc. The key is not to repeat the tests that apply to the nlsolve(...)
part of the function. So, looking for the f(x) - x
behavior, and the out .-= x
behavior, and any kind of issues with the closures, are probably the big ones.
[ ] Test whether the function is doing type inferences correctly. We can do this by playing around with the functions as written, using the @code_warntype
macro. The key is to make sure that we're avoiding things of type Any
or Box
.
[ ] Ideally (and @jlperla can give input) benchmark the way we wrote the function (which uses Anderson acceleration), against the "naive" simple iteration code found in test/fixedpoint.jl
.
Once these are done, we can try looking at methods with Jacobians, like Newton.
Pushed some placeholders to the file.
@antoine-levitt The overhead in a simple linear map appears to be about a factor of 3. Here is the code (using @ArnavSood current implementation in the fixed-point-clean
branch)
using NamedTuples, NLsolve, BenchmarkTools
#I think the tolerance matches the default with NLsolve
function iterate(f, x0; residualnorm = (x -> norm(x,Inf)), tol = 1e-8, maxiter=1000)
residual = Inf
iter = 1
xold = x0
xnew = copy(x0)
while residual > tol && iter < maxiter
xnew = f(xold)
residual = residualnorm(xold - xnew)
xold = copy(xnew)
iter += 1
end
return (xold,iter)
end
#Simple linear map
N = 50
maxiter = 10000
A = Diagonal(rand(N)) #I think this is always a contraction?
b = rand(N,1)
f(x) = A * x + b
x_0 = rand(N,1)
f(x_0)
#Can see it is the same number of iterations/etc.
@show iterate(f, x_0, maxiter=maxiter)
@show fixedpoint(f, x_0, inplace=false, iterations=maxiter)
Then the benchmarking is
@btime iterate($f, $x_0, maxiter=$maxiter)
@btime fixedpoint($f, $x_0, inplace=false, iterations=$maxiter)
The first takes about 164 microseconds while the second 411. When I change the N, it seems to always be in the 2-3X overhead (i.e., not just a fixed cost of overhead).
Any thoughts on how to speed things up looking at https://github.com/JuliaNLSolvers/NLsolve.jl/blob/fixed-point-clean/src/nlsolve/fixedpoint.jl
Run the profiler to identify the bottleneck
There is so little code here prior to calling nlsolve
that I suspect it would just be optimizing NLsolve (i.e. WAY past our paygrade). The only thing I can think of is the use of a closure in https://github.com/JuliaNLSolvers/NLsolve.jl/blob/fixed-point-clean/src/nlsolve/fixedpoint.jl#L29 but we tried to ensure that it was type-stable (though it would be useful for a sanity check if you see anything obviously wrong).
That said, if you can point @ArnavSood hints on how to do profiling in julia it might be useful for the future, it might be useful.
The closure should inline. I meant profile NLsolve.jl and optimize it.
Profiling in Juno is quite simple: https://stackoverflow.com/questions/49719076/macos-python-with-numpy-faster-than-julia-in-training-neural-network/49724611#49724611
Or use ProfileView.jl: https://github.com/timholy/ProfileView.jl .
If you identify bad lines then it would every one else make it faster.
There is so little code here prior to calling nlsolve that I suspect it would just be optimizing NLsolve (i.e. WAY past our paygrade)
Sorry but... why? This is a NLsolve issue, I thought the point was to improve NLsolve. The code isn't very complicated. It's very likely that there are some inefficiencies in the code that can be fixed (which was not developed for the case where the function evaluation is very cheap, and so there might be a few more copy
than necessary for instance). Also, anderson quite heavily uses views, which are faster in 0.7 than in 0.6.
As for profiling, it's indeed very simple: @profile somecode()
and ProfileView.view()
(though that has some bugs you have to work around, see the issues in ProfileView)
So I played around with it a bit, and the biggest offenders are convergence assessment (computing the norm of the residual, checking it's not NaN, etc.). Removing the "+b" (ie, saving an addition per iteration) gained about 10%. I would venture a guess that no realistic problem is simple enough that computing the infinity norm of the residual is the bottleneck of the algorithm, and so I wouldn't worry too much about the potential overhead of NLsolve.
using NLsolve
using BenchmarkTools
srand(0)
N = 1000
A = Diagonal(rand(N) .- 2) # make it negative
b = rand(N)
f(x) = A * x + b
x_0 = rand(N)
@btime nlsolve(f, x_0; inplace=false, iterations=1000, method=:anderson)
@antoine-levitt I think Jesse's point wasn't that improving nlsolve()
is a bad idea, but rather that it's just difficult/not the kind of thing him and I want to mess with (at least for me). And since our fixedpoint()
is just a "thin" wrapper around nlsolve()
...
Those profiling results are useful; thanks for obtaining them. I'm still pretty new to writing and testing "production" Julia code, FWIW. I wonder if there's a valid economic use case for bifurcating the function into "cheap" vs "expensive" function calls (i.e., if the optimizations that minimize function calls eat up more time than they're worth, for cheap functions). But I'll leave that to you and @jlperla.
Sorry but... why? This is a NLsolve issue, I thought the point was to improve NLsolve. The code isn't very complicated.
Not a very helpful way to encourage a community to help support a package... I have been trying to sponsor people to contribute to the package over the last 6'ish months, and I can promise you that the package is much more complicated than you realize. This has been discussed with @pkofod , but I think it is worth keeping it in mind. Not to say that there are better ways to write the pacakge, just that not everyone is capable of understanding (let alone optimizing) it.
Moving past that comment. Of course the goal is to improve NLsolve, but not everyone is capable of optimizing complicated, generic Julia code (which has very rudimentary tooling at this point for non-experts).
I played around with it a bit, and the biggest offenders are convergence assessment (computing the norm of the residual, checking it's not NaN, etc.). Removing the "+b" (ie, saving an addition per iteration) gained about 10%.
Thanks for taking a look at it! Of course, 10% is not a factor of 2-3 times though... did you try @ArnavSood fixed-point-clean
branch?
I wonder if there's a valid economic use case for bifurcating the function into "cheap" vs "expensive" function calls (i.e., if the optimizations that minimize function calls eat up more time than they're worth, for cheap functions).
@ArnavSood : If you mean that there could be special logic in the Anderson implementation to deal with the m=0, beta =1
case better, then that can be considered... but I don't think that bifurcating is generally a good idea.
Apologies if that seemed aggressive: it's just that I genuinely don't understand what problem you're trying to solve here. Since this is the issue tracker I assumed that we were supposed to understand and discuss this, but this looks more like an internal TODO-list, so I'll leave you to it. My only points are that 1) the overhead of converting a fixed-point problem to a zero-finding one is negligible and 2) adding code complicates further maintenance (as you mention, the more layers there are in a code, the harder it is to work on it) and should only be done if there's a real need.
Any misperceived aggression is already forgotten. And thank you so much for looking at this, it is very helpful. The toughest thing to figure out in the implementations are the Cache
. I wonder if a few special cases for m=0
on the cache might help out.
To give you the simple story: There are a huge number of algorithms in economics that have a fixed-point solution at the heart of it. Often with very simple functions and low dimensions for an individual iteration, but where the nested fixed point is run an enormous number of times. The way people tend to do this right now is to just write the fixed point iteration directly in the code (e.g. https://lectures.quantecon.org/jl/mccall_model.html )
Now, my hope is to teach students that using orthogonal algorithms is the best approach for: (1) code clarity; (2) (ultimately) performance given the same algorithm; (3) the ability to swap out more advanced algorithms (e.g. anderson and derivative based approaches rather than naive fixed point iteration). I would love to just be able to replace those sorts of things in lectures with fixedpoint
as a higher-order function. However.... if the performance is off by a factor of 2-3 AND the compilation feels like it takes an extra 10+ seconds for students, it is a hard sell.
Let me see if one of the RAs can do a modified version of one of these lectures to use the library and see if the performance is only off by 10-20%, which I think is perfectly acceptable.
Often with very simple functions and low dimensions for an individual iteration, but where the nested fixed point is run an enormous number of times
This is typically the case where anderson acceleration is very efficient, because it inherits the nice global properties of fixed-point iterations (which might be lost in Newton-based approach) while maintaining good convergence rates. However these Bellman equations look very non-differentiable, so I don't know if it will be very effective.
The overhead of converting from fixed-point to zero-finding should be negligible in most cases, as should be the fact of using the NLsolve library (which does have some overhead for residual computation, bookkeeping, NaN checking, etc.) rather than writing out the code by hand. I think it's perfectly fine to have slightly worse performance for trivial test problems (we are talking ms here), in order to get robust/efficient/flexible code in the complicated cases. The compilation time problem is indeed annoying, but an unavoidable (in the current state of julia at least) side-effect of using a complete library.
@antoine-levitt
OK, so we implemented https://lectures.quantecon.org/jl/mccall_model_with_separation.html with the fixedpoint
branch and kept the code in https://github.com/JuliaNLSolvers/NLsolve.jl/tree/fixed-point-clean as a test.
The summary is that:
nlsolve
with m=0
. This is consistent with the 100% overhead I was seeing before in the simpler example.m=2
is 20X faster than that (i.e. 10X faster than the roll-your-own fixed point). This helps the case for why using the library is a good idea in principle.cc: @Nosferican @sglyon For quantecon lecture notes, you can take a look at the branch and the test.
Squashed a bug, updated REQUIRE, and added some more tests. Should be good to look at as of latest commit (66af0c4).
However, the encouraging part is that anderson acceleration with m=2 is 20X faster than that (i.e. 10X faster than the roll-your-own fixed point). This helps the case for why using the library is a good idea in principle.
Cool, especially since Anderson is very much not optimized for the case of a cheap function evaluation. Did you try increasing m
? Usual values are around 10.
Just wanted to say that I'm here in the background an will take part in the discussion as soon as my vacation ends!
I believe this issue is good to close, since what's on the tin (simple iteration for out-of-place functions) is in the PR. There's a lot of off-label thought here, but my feeling is that we can open separate, new issues for those.
So I'll close it, but anyone can re-open if they wish.
I added in a shell for how out of place execution of the simple fixed point iteration.
/src/nlsolve/fixedpoint.jl
, which has the exported function for users to call. I copied/pasted thenlsolve.jl
code, so this may be imperfect.fixedpoint(...)
function has placeholders to call various algorithms, which I commented out. I put in a/src/solves/simple_iteration.jl
file which we can hopefully use to get the basic fixed point iteration up and running.test/fixed_points.jl
to this new/src/solves/simple_iteration.jl
and made a few tweaks. You will see that I (1) removed the fixed point of a univariate function (which won't work the way nlsolve is defined); (2) left in code for the inplacefixedpointOLD!
function, which is not implemented; and (3) changed the code for calling the function to use thedf.f()
notation (as this is the general way to wrap things in this library).What I want to do is finish the implementation of
simple_iteration
for these fixed points. Some of the obvious things to do:nlsolve
solvers return and the simple_iteration should return identical structures as the solver.fixedpoint
function arguments to use in thesimple_iteration
algorithm.Cache
andtrace
options for this if possible (looking at something like the newton solver) but if it is difficult then it can be added later.