dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
899 stars 256 forks source link

seeking guidance on setting up bleeding edge versions of packages to support dask-ml, etc. #444

Closed ebo closed 5 years ago

ebo commented 5 years ago

I have probably wasted a week or more tracking down various package version issues.

By the time I got the following code to work in my env:

from dask.distributed import Client
from sklearn.externals.joblib import parallel_backend

client = Client()  # Connect to a Dask Cluster

with parallel_backend('dask'):
   \# Your normal scikit-learn code here

Geoviews or one of its dependencies was somehow broken and I lost all my visualization capabilities. So I thought I would ask if the following should be expected to work:

starting from a clean Anaconda 3.7 install:

conda update --all

# create a new env conda create --name dev-env source activate dev-env

# remove any of the packages that are installed by default: conda remove dask dask-glm dask-ml datashader distributed geoviews intake intake-xarray rasterio scikit-image scikit-learn xarray

# git clone all the packages for: conda remove dask dask-glm dask-ml datashader distributed geoviews intake intake-xarray rasterio scikit-image scikit-learn xarray

# for each package: in the above list: cd python setup.py develop

=============

this gives me the bleeding edge version of the packages.

Is there a better way to set this up? I have tried using pip, but I had some early issues with an update overwriting the version. This seemed to mostly work, but not always able to run the bleeding-edge git versions.

As a note, I think that the Managing Environments https://conda.io/docs/user-guide/tasks/manage-environments.html documentation would likely be a good place for this information. What I am looking for is how to install and run development packages of dask, distributed, xarray, and similar packages from their git source. I will go back over the weekend and re-read "Why you need Python environments and how to manage them with Conda" https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c and similar docs and see if I can come up with a way to get this all working well enough to consistently move forward.

TomAugspurger commented 5 years ago

Dask-ML requires dask>=0.18.2 and is tested against dask master and scikit-learn master. Do you think there's an issue with dask-ml's setup?

ebo commented 5 years ago

The only potential problem I see with the setup is that the env requires "python=3.6.6" and might need to be something like "python>=3.6.6". I had installed 3.6.7 in my default setup.

I think the problem might be one of ordering and using the same env to work on EarthML, EarthSIM, dask-ml, and dask-image. The weird behavior seems like it was order dependent.

I will look into this some more and see if I can build a consistent env.

EBo --

On Jan 9 2019 7:58 AM, Tom Augspurger wrote:

Dask-ML requires dask>=0.18.2 and is tested against dask master and scikit-learn master. Do you think there's an issue with dask-ml's setup?

TomAugspurger commented 5 years ago

The file ci/environment-36.yml is for dask-ml's continuous integration.

The package itself doesn't put any restrictions on the version of python. I believe that conda-forge builds packages for 2.7, 3.6, and 3.7.

On Thu, Jan 10, 2019 at 8:14 AM John (EBo) David notifications@github.com wrote:

The only potential problem I see with the setup is that the env requires "python=3.6.6" and might need to be something like "python>=3.6.6". I had installed 3.6.7 in my default setup.

I think the problem might be one of ordering and using the same env to work on EarthML, EarthSIM, dask-ml, and dask-image. The weird behavior seems like it was order dependent.

I will look into this some more and see if I can build a consistent env.

EBo --

On Jan 9 2019 7:58 AM, Tom Augspurger wrote:

Dask-ML requires dask>=0.18.2 and is tested against dask master and scikit-learn master. Do you think there's an issue with dask-ml's setup?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/444#issuecomment-453109806, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIiFXzrYSUNy6gJ4_3ZaeRq8We1DXks5vB0rRgaJpZM4Zp7Ou .

ebo commented 5 years ago

ok. Good to know. I just created an environment.yml file which is a mashup of EathML, dask-ml, and dask-image and see if that behaves.
Working through these I see that there are a couple of packages which appear to have incompatible requirements. I do not know it these are real issues or artifacts of what was tested durring development.

On Jan 10 2019 7:17 AM, Tom Augspurger wrote:

The file ci/environment-36.yml is for dask-ml's continuous integration.

The package itself doesn't put any restrictions on the version of python. I believe that conda-forge builds packages for 2.7, 3.6, and 3.7.

On Thu, Jan 10, 2019 at 8:14 AM John (EBo) David notifications@github.com wrote:

The only potential problem I see with the setup is that the env requires "python=3.6.6" and might need to be something like "python>=3.6.6". I had installed 3.6.7 in my default setup.

I think the problem might be one of ordering and using the same env to work on EarthML, EarthSIM, dask-ml, and dask-image. The weird behavior seems like it was order dependent.

I will look into this some more and see if I can build a consistent env.

EBo --

On Jan 9 2019 7:58 AM, Tom Augspurger wrote:

Dask-ML requires dask>=0.18.2 and is tested against dask master and scikit-learn master. Do you think there's an issue with dask-ml's setup?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/444#issuecomment-453109806, or mute the thread

https://github.com/notifications/unsubscribe-auth/ABQHIiFXzrYSUNy6gJ4_3ZaeRq8We1DXks5vB0rRgaJpZM4Zp7Ou .

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/dask/dask-ml/issues/444#issuecomment-453110613

TomAugspurger commented 5 years ago

I don't think there's a dask-ml issue here. Closing.