Open Alex-Lekov opened 3 years ago
I had a quick look on your benchmark. I saw the time limit is 1 hour but some of datasets has over 40k instances, so TPOT may not pass the initial generation(which is randomly generate pipelines) and should not optimize pipelines via genetic programing. I suggest increasing time limit to 1 day for each large size of datasets if n_jobs=1 or using parallel training with dask.
I made Benchmark AutoML libs, and TPOT showed very poor results, even worse than the usual CatBoost with standard parameters! https://github.com/Alex-Lekov/AutoML-Benchmark/ I run the benchmark in docker - so you can easily reproduce it
here is the code from the benchmark:
Is the code correct? (I do not adjust the advanced parameters, since AutoML, in theory, should pick everything up by itself, that's why it is AutoML)
Please, tell me what am I doing wrong? Maybe I am using the library "incorrectly"? or is this a real result and the library is really that bad?