EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.58k stars 1.55k forks source link

Performance comparison: Genetic programming vs Bayesian Optimization #335

Open TheodoreGalanos opened 7 years ago

TheodoreGalanos commented 7 years ago

Hello everyone,

I was wondering if anyone has worked on this subject. I will soon go about and try to test this out but before that I thought I'd ask for past experience.

I was thinking that such a comparison might be valuable in the sense that it might provide hints on areas of improvement for Tpot as well.

Since I am a beginner, I would welcome any thoughts from more experienced users on how to properly structure this comparison, types of problems, datasets, etc.

Thank you in advance.

Kind regards, Theodore.

mfeurer commented 7 years ago

One of the key points you should do is to not make up new benchmarks, but instead use the setup from previously published papers about the same topic. As one of the authors of this paper I'm a little bit biased on what datasets to use, but I think that using either the ones used in the paper on Auto-WEKA, or the 13 datasets presented by us are a good starting point. You would of course also need to decide on a metric.

Next, you would have to make sure that both have a similar or equal search space, e.g. they search the same classifiers, preprocessors and their hyperparameters.

Last, you need to find an implementation of Bayesian optimization which scales to such configurations spaces; to the best of my knowledge, these are only hyperopt and SMAC.

TheodoreGalanos commented 7 years ago

Hi Matthias,

This is exactly why I posted here in the first place! Thank you for all the input, it is really invaluable.

My main goal in this exercise is learning and not really proving a point. I'm a beginner in the field, looking to learn and use it in another field like so many others, so I'm sure I cannot and probably should not try and redo things that more adept ppl in ML have done.

I also thank you very much for the paper you linked I will go over it and other relevant literature before I start.

One last question if I may. I've already downloaded hyperopt and SMAC. Do you think the non-commercial version of spearmint would also be able to handle these spaces as well?

Once again, thanks for the input.

Kind regards, Theodore.

mfeurer commented 7 years ago

I don't that spearmint is able to handle these spaces at all. My experience stems from an experiment in which we tried to run spearmint to optimize Auto-WEKA. You could try the code from this group, which is linked in one of their latest papers. They claim that their GP is able to handle such spaces.

rhiever commented 7 years ago

I think such a comparison would indeed be valuable. Another important aspect to control for in this kind of comparison is the ML operators and parameters that you optimize over using GenProg and BO. We've made several attempts to compare auto-sklearn and TPOT internally in the past, but keep running into issues where we use different operators and different parameters (likely due to historical

@mfeurer, maybe you'd be interested in doing a larger comparison? We have a large, curated repo of supervised learning data sets at PMLB and if we could align TPOT and auto-sklearn to have the same operators---even if it's a simpler subset of operators for the purposes of the experiment---that could make an interesting comparison between GenProg and BO.

mfeurer commented 7 years ago

@rhiever Such comparison would indeed be valuable, but I don't think I'll be able to do anything in this direction before April. A potential step into this direction might be our groups effort to create a hyperparameter optimization benchmark set, into which I'm about to include auto-sklearn (see PR). Then, we'd only need to run the different optimization algorithms (BO vs GP) on that problem.

setuc commented 7 years ago

We have done some work internally around this. Would be happy to do this on the public datasets.

@rhiever Do you have suggestions on which ones to run? or what are some good test sets?

rhiever commented 7 years ago

All of the PMLB data sets are a good starting point. Most of them are small and should run pretty quickly.

setuc commented 7 years ago

I have started work on it....will discuss the findings some place...is there a better place to collaborate than just github? email perhaps

sy-zygy commented 7 years ago

@setuc hope you collaborate here so everyone can observe ;)

rhiever commented 7 years ago

@setuc, I think this issue would be a good place to record your findings for folks following it here. If your analysis is changing quite a bit, maybe share a link to a Jupyter Notebook?

TheodoreGalanos commented 7 years ago

Hi everyone,

First of all I am very glad that this conversation got interest from more knowledgable people than me. I haven't been able to follow up this in practice but I have been keeping an eye on the conversation.

@setuc I will just echo what the others have said, I would also love to see your results or even better the process of comparison. I think that can be extremely educational for both approaches, as well as their comparison.

Kind regards, Theodore.

setuc commented 7 years ago

Sure, I will be happy to share the results....Here is what i have planned.

Here the optimizers that i will plan:

  1. Hyperopt (http://github.com/hyperopt/hyperopt)
  2. SMAC (http://www.cs.ubc.ca/labs/beta/Projects/SMAC/)
  3. Spearmint (https://github.com/HIPS/Spearmint)
  4. Osprey (https://github.com/msmbuilder/osprey)
  5. MOE (https://github.com/Yelp/MOE)
  6. TPOT (https://github.com/rhiever/tpot)
  7. Optunity http://optunity.readthedocs.io/en/latest/)

If there are any other particular ones, do let me know. Currently I am building up some wrapper code, so that i dont have to keep configuring each of them for specific dataset. I will share the code or a jupyter notebook, so all can collaborate. I will keep posting my progress here. I think there is real need to understand the performances of these algorithms. Thanks for a wonderful question @TheodoreGalanos . Keep you guys posted.

mfeurer commented 7 years ago

You could also try our latest implementation of SMAC in python. Could simplify the setup.

rhiever commented 7 years ago

Great to see a Python version of SMAC, @mfeurer!

msdw commented 7 years ago

Any update ? Is it possible to collaborate with us ? If yes, thank you to share your first tests.

sashml commented 7 years ago

If there are any other particular ones, do let me know.

  1. Scikit-optimize (https://github.com/scikit-optimize/scikit-optimize)
setuc commented 7 years ago

hey, guys, sorry to disappear for a while...got tied up with some personal work. I will restart this work in a couple of weeks and start regular postings.

setuc commented 7 years ago

As i am starting to gear up, one of the things i noticed that not all of the optimizer tune the whole pipeline. for the sake of comparison, I am going to disable the feature extractions across all of them. We will just look at the HP space across various algorithms or within an algorithm.

rhiever commented 7 years ago

In the meantime, there has been one research paper published that compares TPOT, auto-sklearn, and a variant of TPOT using grammar-based GP on a handful of problems: https://link.springer.com/chapter/10.1007/978-3-319-55696-3_16

Overall TPOT performed best on a majority of the problems, but most of the differences were fairly small (1-2% accuracy difference). The authors attributed TPOT's success to a smaller search space, which has been a focus on TPOT development---culling operators and parameters that are consistently not useful (or less useful than other operators).

setuc commented 7 years ago

@rhiever Good read.....i guess this is more motivations for me to finish the test work....and the inclusion of Bayesian methodologies will be just perfect :). I didnt look at the code until now...a ton of improvement and more elegant. Good stuff.

TheodoreGalanos commented 7 years ago

I know academic publishing is like a business nowadays (I mean the publishers) but 25 euros for one paper is crazy :)

Man I miss educational access.

I am glad you have taken this up again @setuc. It'll be very interesting to see your results.

sashml commented 7 years ago

Take for free if really needed - http://link.springer.com.secure.sci-hub.cc/chapter/10.1007/978-3-319-55696-3_16

rhiever commented 7 years ago

You can also email the authors if you'd like a legal copy of the paper. :-)

lesshaste commented 7 years ago

https://www.researchgate.net/profile/Gisele_Pappa/publication/315065572_RECIPE_A_Grammar-Based_Framework_for_Automatically_Evolving_Classification_Pipelines/links/58f6199ca6fdcc55b6c2d2c3/RECIPE-A-Grammar-Based-Framework-for-Automatically-Evolving-Classification-Pipelines.pdf

That seems legal and working.

mfeurer commented 7 years ago

Indeed a very interesting paper to read, and we definitively need to improve Auto-sklearn to work comparable to TPOT and RECIPE on those datasets :) @walterjgsp are the bioinformatics datasets available online in order to perform runs on them?

Basically, my two take-home messages from this paper with respect to a comparison are:

  1. datasets need to be representative. Without claiming that the results in the paper are false/bad etc., I wonder what happens if the datasets become larger and whether it will still be possible to run up to 40 generations of 100 individuals.
  2. there should be some evaluation of how these methods perform with respect to runtime and number of function evaluation in order to understand how many function evaluations/time is necessary for a method to perform well.
rhiever commented 7 years ago

Yes, I wouldn't consider the RECIPE paper to be a definitive comparison by any means. It's simply one step toward making a fairer comparison between the underlying methodologies. I think your point 1) is especially poignant: we need better benchmarks. PMLB and OpenML are good steps toward that end at least.

walterjgsp commented 7 years ago

Hi @mfeurer, the datasets used in the RECIPE paper can be found on the following link:

https://github.com/RecipeML/Recipe/blob/master/datasets/NewDataset.zip

I uploaded then on the format used for our tests. If you need the .arff file of a dataset and you can't find in this link, you can just send me an email that i will prepare them and send it to you

mi7plus commented 7 years ago

Hello all! Is there any progress on this topic? I am very interested.

ck37 commented 6 years ago

It would be nice to eventually incorporate Hyperband as well:

https://github.com/zygmuntz/hyperband https://people.eecs.berkeley.edu/~kjamieson/hyperband.html

shahlaebrahimi commented 6 years ago

Also, this one: BayesianOptimization

lawrencekiba commented 5 years ago

Just looking into this topic & found this thread. May I know if there is any progress since?

rhiever commented 5 years ago

Not that I know of. I have seen some papers that compare TPOT and auto-sklearn to other algorithms on a handful of datasets and most of the time they come out even, which has been my experience too. IMO the search algorithm matters much less than the pipeline components and the AutoML tool's ability to represent complex pipelines.

I think there has been more focus on developing the offerings of the individual algorithms rather than comparing existing algorithms. For anyone who is interested, I encourage them to pursue this project.