davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.53k stars 3.37k forks source link

Particle Swarm Optimization #1923

Closed davemers0160 closed 4 years ago

davemers0160 commented 4 years ago

Davis - Any interest in adding PSO to dlib?

davisking commented 4 years ago

No. I’m pretty down on PSO. My opinion is that it should never be used by anyone :)

davemers0160 commented 4 years ago

Ouch :) That's fine. Have you had issues with it in the past?

davisking commented 4 years ago

I haven't used it in a long time. Maybe since I was an undergrad, I forget when exactly. The reason PSO is popular is because it's very very simple to implement. We are talking about something that doesn't even require a highschool level of math to implement.

The problem with it is just that what it does is totally silly. It has essentially no ability to converge to any sort of solution, even on trivial problems. I just looked to see if people are still publishing papers on it, and I'm honestly saddened to see that they are. A very common pattern in these papers is to test PSO on problems where the optimal solution is x=0, and then initialize the algorithm with a swarm of particles with mean 0. That is obviously cheating. And even in that case, the sample complexity of PSO is still incredibly high. I clicked on a few papers just now and found that their evaluations are still doing exactly this. These papers are fringe science, and you can't find them published in reputable venues for good reason.

Like honestly, I would expect many (or even all, depending on what exactly is counted as a PSO method) PSO implementations to be worse than pure random search. The papers never compare to pure random search though. You should before using PSO.

But more importantly, there are much better algorithms, like the MaxLIPO method in dlib::find_max_global(). That particular code is optimized for functions that are very expensive to evaluate, where you really won't ever call the objective function more than a few thousand times. In terms of how many objective function calls it takes to reach a high accuracy solution, dlib::find_max_global() should be billions of times faster than PSO on normal problems. If you wanted something that has less overhead in the solver (because your objective function is very cheap to evaluate), dlib::find_max_boboyqa() is a good method. For some problems, where you can easily do many parallel objective function evaluations, I'm also a fan of CMA-ES type approaches, which basically boil down to a kind of random subspace search plus bisectioning.

Or if you are just looking for something that isn't super complicated that you want to implement yourself take a look at this method: https://en.wikipedia.org/wiki/Powell%27s_method. It's a classic method (from 1964), is very simple, and provably works and gives pretty reasonable accuracy with modest compute requirements.

davemers0160 commented 4 years ago

I agree, PSO is stupidly simple. But, I've had a lot of success with it. I tend to only use it on problems where I don't have a differentiable function. The Powell method looks interesting. I hadn't heard of this one, and I'm going to have to dig more into it. The only drawback I can see right now is that you are limited by the resolution of the vectors.

I think that a lot of the problem with PSO is that people see it as a "fire and forget" method and expect it to work without really understanding its limitations/failure modes. I think that the basic algorithm is sound, but there is room for improvement to reduce the complexity especially when it comes to possibility of evaluating the same objective function parameters more than once.

Thanks for the ideas. Looks like I've got some more research to do :)

davisking commented 4 years ago

Have you compared PSO to random search in those applications?

More importantly though, have you tried MaxLIPO or BOBYQA? Those should be much better. These methods do not require derivatives. I really think there is no situation where PSO is the right thing to do.

I’m also not suggesting to use the Powell method in real applications. Although there is no limited resolution issue with it. It will find a solution to full floating point precision. However, anywhere it might be applicable BOBYQA or one of its variants is almost certainly more appropriate.

On Nov 19, 2019, at 7:31 AM, davemers0160 notifications@github.com wrote:

 I agree, PSO is stupidly simple. But, I've had a lot of success with it. I tend to only use it on problems where I don't have a differentiable function. The Powell method looks interesting. I hadn't heard of this one, and I'm going to have to dig more into it. The only drawback I can see right now is that you are limited by the resolution of the vectors.

I think that a lot of the problem with PSO is that people see it as a "fire and forget" method and expect it to work without really understanding its limitations/failure modes. I think that the basic algorithm is sound, but there is room for improvement to reduce the complexity especially when it comes to possibility of evaluating the same objective function parameters more than once.

Thanks for the ideas. Looks like I've got some more research to do :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

davemers0160 commented 4 years ago

I have not compared PSO to random search. I don't know that random search would get to the same minimum that the PSO did. I was able to get 100% particle convergence with 20 particles and 142 iterations (2840 objective functions calls) and I was dealing with over 1.3x10^15 parameter combinations.

I started looking at the find_min_global() function and seeing how it does with the schwefel function:

schwefel

I did n=3 and had to play around a little with the right number of max_function_calls() (settled on 2000) to get it to converge to the right answer. I'll have to try some more complex examples :)

davisking commented 4 years ago

It's been bothering me for a while now that find_min_global() is setup to optimize really expensive objective functions, but users sometimes use it to optimize really inexpensive ones. That is, ones where they could make a huge number of calls to the objective function in a short amount of time. The settings of find_min_global() tell it to use MaxLIPO modeling all the time, which can take several orders of magnitude more time to execute than a simple objective function like the Schwefel function. If your objective function takes minutes to execute (or even hours) then MaxLIPO modeling is hugely valuable. But it's a waste of time if the objective function is faster than the MaxLIPO code.

So I just pushed a change to dlib's code that causes find_min_global() to measure the relative execution speeds of the MaxLIPO modeling and the objective function. If the objective function is faster it does less LIPO modeling, ensuring we never spend more time running LIPO modeling than running the actual objective function.

All the local trust region stuff is still there just like it always was. That stuff is fast and good all the time, regardless of objective function speed.

TLDR: find_min_global() is now much faster (in terms of wall clock time) for extremely inexpensive objective functions.

davemers0160 commented 4 years ago

Yeah, I figured that was probably the case.

For very complex problems where the objective function might take hours/days to evaluate is there an option to save checkpoints along the way in the find_min_global() function?

Reason:

davisking commented 4 years ago

find_min_global() is just a convenience wrapper around this: http://dlib.net/optimization.html#global_function_search. global_function_search gives you a full API that can do this kind of stuff like saving things and resuming later, running things in complicated parallel ways, or whatever else you might want to do.

davemers0160 commented 4 years ago

Perfect! Thanks.