JuliaNLSolvers / NLsolve.jl

Julia solvers for systems of nonlinear equations and mixed complementarity problems
Other
330 stars 66 forks source link

Implementation of "simple iteration" for the `fixedpoint` with out of place functions. #148

Closed jlperla closed 6 years ago

jlperla commented 6 years ago

I added in a shell for how out of place execution of the simple fixed point iteration.

  1. In the committed files, you will see /src/nlsolve/fixedpoint.jl, which has the exported function for users to call. I copied/pasted the nlsolve.jl code, so this may be imperfect.
  2. That fixedpoint(...) function has placeholders to call various algorithms, which I commented out. I put in a /src/solves/simple_iteration.jl file which we can hopefully use to get the basic fixed point iteration up and running.
  3. I moved the code from our test/fixed_points.jl to this new /src/solves/simple_iteration.jl and made a few tweaks. You will see that I (1) removed the fixed point of a univariate function (which won't work the way nlsolve is defined); (2) left in code for the inplace fixedpointOLD! function, which is not implemented; and (3) changed the code for calling the function to use the df.f() notation (as this is the general way to wrap things in this library).

What I want to do is finish the implementation of simple_iteration for these fixed points. Some of the obvious things to do:

antoine-levitt commented 6 years ago

Sorry if I missed something, but what was wrong with just using Anderson acceleration with no history?

jlperla commented 6 years ago

The key is to be able to have a fixedpoint as a function and then use different algorithms in the background, starting with just iterating on the map.

For the Newton and Anderson the interface would basically just do a little bookkeeping and then call nsolve with the anderson or newton tags.. In the case of the netwon, it might need to fudge any user defined jacobian by subtracting a identity matrix or something... But we can just get simple iteration working first.

antoine-levitt commented 6 years ago

So you mean have something like fixedpoint(f) = nlsolve(x -> f(x) - x) (but with all the bells of whistles of inplace/out of place, jacobians, etc.)? Sure, that makes sense, but there's no need to reimplement any algorithm, right?

jlperla commented 6 years ago

@ArnavSood I think the basic approach is solid. A few other things:

Part of the testing should be to try it with different types. A few comments:

jlperla commented 6 years ago

So you mean have something like fixedpoint(f) = nlsolve(x -> f(x) - x) (but with all the bells of whistles of inplace/out of place, jacobians, etc.)? Sure, that makes sense, but there's no need to reimplement any algorithm, right?

@antoine-levitt The only algorithm that is missing is the simple iteration one (which is the default fixed point algorithm in many cases). Everything else is just bells and whistles, exactly as you are saying. The simple-iteration is an essential algorithm because of convergence properties - i.e. if you can prove the mapping is a contraction, then it is basically executing Banach's fixed point theorem. For better or worse, it is usually the one economists start with - though I hope they use fancier algorithms if we can show them how to swap things out!

antoine-levitt commented 6 years ago

Again, just do anderson with m=0 and you're set. This thread (and others) just made me afraid that a lot of code is going to be duplicated with no real motivation, complicating further developments and increasing technical debt, but as long as it's just the API and no algorithm actually gets implemented that's perfectly fine!

The simple-iteration is an essential algorithm because of convergence properties - i.e. if you can prove the mapping is a contraction, then it is basically executing Banach's fixed point theorem

Anderson acceleration actually preserves this property under relatively mild assumptions, see https://epubs.siam.org/doi/abs/10.1137/130919398

jlperla commented 6 years ago

Again, just do anderson with m=0 and you're set. This thread (and others) just made me afraid that a lot of code is going to be duplicated with no real motivation, complicating further developments and increasing technical debt, but as long as it's just the API and no algorithm actually gets implemented that's perfectly fine!

We don't want to duplicate code at all, just get the API working with overhead-free performance. The only code that needs to be duplicated is the nsolve to fixedpoint interface.

For using Anderson with the m=0 case, the thought was to get the minimal simple-iteration code up and running, but if you think the performance will be nearly identical, then they could try to do that route instead? We can use the naive, simple fixed-point iteration in the tests for benchmark comparison, and only implement a separate algorithm if there is significant overhead?

jlperla commented 6 years ago

Anderson acceleration actually preserves this property under relatively mild assumptions, see https://epubs.siam.org/doi/abs/10.1137/130919398

Thanks! One of my hopes with getting this convenience wrapper is that we can convince economists on the convergence and speed of better fixed point algorithms they can test by just swapping out an algorithm flag.

antoine-levitt commented 6 years ago

As mentioned in https://github.com/JuliaNLSolvers/NLsolve.jl/issues/152, I don't think there will be any significant performance difference. The only thing you can exploit is shortcutting x + g(x) - x into g(x), which is unlikely to matter unless g is very cheap (in which case you're probably better off with StaticArrays, and in that case the compiler might even be smart enough to compile the redundant operations away). But benchmarks would certainly be interesting!

Thanks! One of my hopes with getting this convenience wrapper is that we can convince economists on the convergence and speed of better fixed point algorithms they can test by just swapping out an algorithm flag.

Even more reason to use anderson as the fixed-point algorithm: a better algorithm is just a m=10 away!

arnavs commented 6 years ago

OK, talked with @jlperla. So we will write these tests and use :simple_iteration (or however we pass it in) as a call to Anderson with no memory.

arnavs commented 6 years ago

@AliKarimirad, if you pull the git repo now (branch fixed-point-clean, as of commit b0a1fac4fe44d510fabbf201e7fb91ac88cc341e or later), it should be a working version with one method in there for the fixed point of a function (in place or out of place), without the Jacobian, located in src/nlsolve/fixedpoint.jl.

Here's what we need to do:

@testset "PreMetric, SemiMetric, Metric on $T" for T in (Float64, Int64, BigFloat, BigInt) begin ... end 

Once these are done, we can try looking at methods with Jacobians, like Newton.

arnavs commented 6 years ago

Pushed some placeholders to the file.

jlperla commented 6 years ago

@antoine-levitt The overhead in a simple linear map appears to be about a factor of 3. Here is the code (using @ArnavSood current implementation in the fixed-point-clean branch)

using NamedTuples, NLsolve, BenchmarkTools
#I think the tolerance matches the default with NLsolve
function iterate(f, x0; residualnorm = (x -> norm(x,Inf)), tol = 1e-8, maxiter=1000) 
    residual = Inf
    iter = 1
    xold = x0
    xnew = copy(x0)
    while residual > tol && iter < maxiter
        xnew = f(xold)
        residual = residualnorm(xold - xnew)
        xold = copy(xnew)
        iter += 1
    end
    return (xold,iter)
end

#Simple linear map
N = 50
maxiter = 10000
A = Diagonal(rand(N)) #I think this is always a contraction?
b = rand(N,1)
f(x) = A * x + b
x_0 = rand(N,1)
f(x_0)

#Can see it is the same number of iterations/etc.
@show iterate(f, x_0, maxiter=maxiter)
@show fixedpoint(f, x_0, inplace=false, iterations=maxiter)

Then the benchmarking is

@btime iterate($f, $x_0, maxiter=$maxiter)
@btime fixedpoint($f, $x_0, inplace=false, iterations=$maxiter)

The first takes about 164 microseconds while the second 411. When I change the N, it seems to always be in the 2-3X overhead (i.e., not just a fixed cost of overhead).

Any thoughts on how to speed things up looking at https://github.com/JuliaNLSolvers/NLsolve.jl/blob/fixed-point-clean/src/nlsolve/fixedpoint.jl

ChrisRackauckas commented 6 years ago

Run the profiler to identify the bottleneck

jlperla commented 6 years ago

There is so little code here prior to calling nlsolve that I suspect it would just be optimizing NLsolve (i.e. WAY past our paygrade). The only thing I can think of is the use of a closure in https://github.com/JuliaNLSolvers/NLsolve.jl/blob/fixed-point-clean/src/nlsolve/fixedpoint.jl#L29 but we tried to ensure that it was type-stable (though it would be useful for a sanity check if you see anything obviously wrong).

That said, if you can point @ArnavSood hints on how to do profiling in julia it might be useful for the future, it might be useful.

ChrisRackauckas commented 6 years ago

The closure should inline. I meant profile NLsolve.jl and optimize it.

Profiling in Juno is quite simple: https://stackoverflow.com/questions/49719076/macos-python-with-numpy-faster-than-julia-in-training-neural-network/49724611#49724611

Or use ProfileView.jl: https://github.com/timholy/ProfileView.jl .

If you identify bad lines then it would every one else make it faster.

antoine-levitt commented 6 years ago

There is so little code here prior to calling nlsolve that I suspect it would just be optimizing NLsolve (i.e. WAY past our paygrade)

Sorry but... why? This is a NLsolve issue, I thought the point was to improve NLsolve. The code isn't very complicated. It's very likely that there are some inefficiencies in the code that can be fixed (which was not developed for the case where the function evaluation is very cheap, and so there might be a few more copy than necessary for instance). Also, anderson quite heavily uses views, which are faster in 0.7 than in 0.6.

As for profiling, it's indeed very simple: @profile somecode() and ProfileView.view() (though that has some bugs you have to work around, see the issues in ProfileView)

antoine-levitt commented 6 years ago

So I played around with it a bit, and the biggest offenders are convergence assessment (computing the norm of the residual, checking it's not NaN, etc.). Removing the "+b" (ie, saving an addition per iteration) gained about 10%. I would venture a guess that no realistic problem is simple enough that computing the infinity norm of the residual is the bottleneck of the algorithm, and so I wouldn't worry too much about the potential overhead of NLsolve.

using NLsolve
using BenchmarkTools
srand(0)
N = 1000
A = Diagonal(rand(N) .- 2) # make it negative
b = rand(N)
f(x) = A * x + b
x_0 = rand(N)
@btime nlsolve(f, x_0; inplace=false, iterations=1000, method=:anderson)
arnavs commented 6 years ago

@antoine-levitt I think Jesse's point wasn't that improving nlsolve() is a bad idea, but rather that it's just difficult/not the kind of thing him and I want to mess with (at least for me). And since our fixedpoint() is just a "thin" wrapper around nlsolve()...

Those profiling results are useful; thanks for obtaining them. I'm still pretty new to writing and testing "production" Julia code, FWIW. I wonder if there's a valid economic use case for bifurcating the function into "cheap" vs "expensive" function calls (i.e., if the optimizations that minimize function calls eat up more time than they're worth, for cheap functions). But I'll leave that to you and @jlperla.

jlperla commented 6 years ago

Sorry but... why? This is a NLsolve issue, I thought the point was to improve NLsolve. The code isn't very complicated.

Not a very helpful way to encourage a community to help support a package... I have been trying to sponsor people to contribute to the package over the last 6'ish months, and I can promise you that the package is much more complicated than you realize. This has been discussed with @pkofod , but I think it is worth keeping it in mind. Not to say that there are better ways to write the pacakge, just that not everyone is capable of understanding (let alone optimizing) it.

Moving past that comment. Of course the goal is to improve NLsolve, but not everyone is capable of optimizing complicated, generic Julia code (which has very rudimentary tooling at this point for non-experts).

I played around with it a bit, and the biggest offenders are convergence assessment (computing the norm of the residual, checking it's not NaN, etc.). Removing the "+b" (ie, saving an addition per iteration) gained about 10%.

Thanks for taking a look at it! Of course, 10% is not a factor of 2-3 times though... did you try @ArnavSood fixed-point-clean branch?

I wonder if there's a valid economic use case for bifurcating the function into "cheap" vs "expensive" function calls (i.e., if the optimizations that minimize function calls eat up more time than they're worth, for cheap functions).

@ArnavSood : If you mean that there could be special logic in the Anderson implementation to deal with the m=0, beta =1 case better, then that can be considered... but I don't think that bifurcating is generally a good idea.

antoine-levitt commented 6 years ago

Apologies if that seemed aggressive: it's just that I genuinely don't understand what problem you're trying to solve here. Since this is the issue tracker I assumed that we were supposed to understand and discuss this, but this looks more like an internal TODO-list, so I'll leave you to it. My only points are that 1) the overhead of converting a fixed-point problem to a zero-finding one is negligible and 2) adding code complicates further maintenance (as you mention, the more layers there are in a code, the harder it is to work on it) and should only be done if there's a real need.

jlperla commented 6 years ago

Any misperceived aggression is already forgotten. And thank you so much for looking at this, it is very helpful. The toughest thing to figure out in the implementations are the Cache. I wonder if a few special cases for m=0 on the cache might help out.

To give you the simple story: There are a huge number of algorithms in economics that have a fixed-point solution at the heart of it. Often with very simple functions and low dimensions for an individual iteration, but where the nested fixed point is run an enormous number of times. The way people tend to do this right now is to just write the fixed point iteration directly in the code (e.g. https://lectures.quantecon.org/jl/mccall_model.html )

Now, my hope is to teach students that using orthogonal algorithms is the best approach for: (1) code clarity; (2) (ultimately) performance given the same algorithm; (3) the ability to swap out more advanced algorithms (e.g. anderson and derivative based approaches rather than naive fixed point iteration). I would love to just be able to replace those sorts of things in lectures with fixedpoint as a higher-order function. However.... if the performance is off by a factor of 2-3 AND the compilation feels like it takes an extra 10+ seconds for students, it is a hard sell.

Let me see if one of the RAs can do a modified version of one of these lectures to use the library and see if the performance is only off by 10-20%, which I think is perfectly acceptable.

antoine-levitt commented 6 years ago

Often with very simple functions and low dimensions for an individual iteration, but where the nested fixed point is run an enormous number of times

This is typically the case where anderson acceleration is very efficient, because it inherits the nice global properties of fixed-point iterations (which might be lost in Newton-based approach) while maintaining good convergence rates. However these Bellman equations look very non-differentiable, so I don't know if it will be very effective.

The overhead of converting from fixed-point to zero-finding should be negligible in most cases, as should be the fact of using the NLsolve library (which does have some overhead for residual computation, bookkeeping, NaN checking, etc.) rather than writing out the code by hand. I think it's perfectly fine to have slightly worse performance for trivial test problems (we are talking ms here), in order to get robust/efficient/flexible code in the complicated cases. The compilation time problem is indeed annoying, but an unavoidable (in the current state of julia at least) side-effect of using a complete library.

jlperla commented 6 years ago

@antoine-levitt

OK, so we implemented https://lectures.quantecon.org/jl/mccall_model_with_separation.html with the fixedpoint branch and kept the code in https://github.com/JuliaNLSolvers/NLsolve.jl/tree/fixed-point-clean as a test.

The summary is that:

cc: @Nosferican @sglyon For quantecon lecture notes, you can take a look at the branch and the test.

arnavs commented 6 years ago

Squashed a bug, updated REQUIRE, and added some more tests. Should be good to look at as of latest commit (66af0c4).

antoine-levitt commented 6 years ago

However, the encouraging part is that anderson acceleration with m=2 is 20X faster than that (i.e. 10X faster than the roll-your-own fixed point). This helps the case for why using the library is a good idea in principle.

Cool, especially since Anderson is very much not optimized for the case of a cheap function evaluation. Did you try increasing m? Usual values are around 10.

pkofod commented 6 years ago

Just wanted to say that I'm here in the background an will take part in the discussion as soon as my vacation ends!

arnavs commented 6 years ago

I believe this issue is good to close, since what's on the tin (simple iteration for out-of-place functions) is in the PR. There's a lot of off-label thought here, but my feeling is that we can open separate, new issues for those.

So I'll close it, but anyone can re-open if they wish.