EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.76k stars 1.57k forks source link

Install and Run TPOT in Google Colab WITH Dask Enabled???? #1095

Closed windowshopr closed 4 years ago

windowshopr commented 4 years ago

I'm trying to run a TPOT session in a Google Colab notebook, but running into some issues regarding "dask", and I think it has to do with the way I'm pip installing the packages.

Context of the issue

Basically, using a Python 3.6 environment in Google Colab, I'd like to run a TPOT Classification session WITH Dask enabled, but I keep getting issues with this error:

ImportError: 'use_dask' requires the optional dask and dask-ml depedencies. cannot import name 'future_set_exc_info'

...even though I have confirmed that both dask and dask-ml are successfully installed and in the dist-packages folder from. I don't think this is a TPOT issue, but I'm opening it here in the hopes that this will leave some documentation that others can use for the successful install of the dependencies required (in the appropriate order, and what versions of each dependency need to be there).

Basically, after a few hours of trying different combinations of pip installs, I've landed on:

!pip install fsspec xgboost
%pip install -U distributed scikit-learn dask-ml dask-glm sklearn
%pip install "tornado>=5"
%pip install "dask[complete]"
!pip install TPOT

The Tornado upgrade and dask[complete] are things I was trying from some other similar issues I've found on GitHub in the Dask pages, but I am throwing in the towel. Can someone confirm a working install of TPOT Classification WITH the use_dask=True parameter, and what versions of each dependency are used in the environment?

A simple reproducible example to run in a new Google Colab 3.6 environment would look like the following. If asked, I can paste the full traceback, but whoever runs this should be able to see it for themselves. Thanks!

!pip install sklearn fsspec xgboost
%pip install -U distributed scikit-learn dask-ml dask-glm
%pip install "tornado>=5" 
%pip install "dask[complete]"
!pip install TPOT

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
    iris.target.astype(np.float64), train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42, use_dask=True)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
windowshopr commented 4 years ago

UPDATE:

Looks like I hit a winning combination with the below:

!pip install TPOT
!pip install dask==2.20.0 dask-glm==0.2.0 dask-ml==1.0.0
!pip install tornado==5.0
!pip install distributed==2.2.0
!pip install xgboost==0.90

This got my example up and running. I should have cross referenced my pip freeze output with the pip freeze output on Colab to check versions of the dependencies earlier, but this was able to get it going. Closing!

windowshopr commented 4 years ago

I should re-iterate, starting from a brand new "factory reset"ed session, I run those pip installs shown above, then when it gets to the training session, it errors out, but when you hit "Restart and Run All", it works. It's something to do with changing the version of Tornado, and you can only run the new pip installed version after you restart the runtime. Don't factory restart the session a second time, else you're just starting from scratch again. Seems finicky in Colab, but that's what worked for me.

mkuehn10 commented 4 years ago

I had to add fsspec to the end, but your last comment seems to have worked for me

!pip install TPOT
!pip install dask==2.20.0 dask-glm==0.2.0 dask-ml==1.0.0
!pip install tornado==5.0
!pip install distributed==2.2.0
!pip install xgboost==0.90
!pip install fsspec
weixuanfu commented 4 years ago

Yes, we have updated installation guide with including fsspec.