Kaggle / docker-python

Kaggle Python docker image
Apache License 2.0
2.44k stars 949 forks source link

Add support for rapidsai - cudf and cuml packages GPU Data Science #594

Closed jperez999 closed 3 years ago

jperez999 commented 5 years ago

I have a PR ready for review with passing test cases for both cuml and cudf.

rosbo commented 4 years ago

@jperez999

There is a few conflicts that prevent us from adding cuDF and cuML:

cudf=0.10 -> pandas[version='>=0.24.2,<0.25']

We are using pandas 0.25.3 and we have several packages requiring >= 0.25. Any particular reasons for requiring this restriction?

cudf=0.10 -> pyarrow=0.14.1

We are using pyarrow 0.15.1 and we also have several packages requiring >= 0.15. Any particular reasons for not allowing pyarrow 0.15.x?

jperez999 commented 4 years ago

The reason we are running pandas < 0.25 https://github.com/rapidsai/cudf/pull/3486 And pyarrow lockdown is because of a conda-forge conflict https://github.com/rapidsai/cudf/pull/3318

rosbo commented 4 years ago

Hi @jperez999 and @kkraus14

Are you planning on relaxing the constraint on pyarrow added in https://github.com/rapidsai/cudf/pull/3318? This is causing a cascading slew of downgrades the arrow-cpp lib and boost-cpp and so on which then causes conflicts with other libraries in the Kaggle image.

Our image is currently using pyarrow 1.16 and libboost 1.72.0.

Thanks

kkraus14 commented 4 years ago

We are planning on upgrading to Arrow 0.17.1 in cudf 0.15 shortly. Unfortunately there was a bug introduced in 0.15.1 of Arrow that made it incompatible with nvcc which made us unable to upgrade until now.

Note that 0.15 won't be released for a few months though as we're currently in progress of the 0.14 release.

kkraus14 commented 4 years ago

Hi @rosbo, we've upgraded to 0.17.1 in our nightlies but Arrow 1.0.0 has since released that we're considering upgrading to. Would that be a blocker for the kaggle container?

rosbo commented 4 years ago

Hi @kkraus14,

It shouldn't be a blocker, arrow 0.17.1 and 1.0.0 both work with our version of the boost library (cause of the cascading changes earlier).

Thanks for checking.

kkraus14 commented 4 years ago

@rosbo Any chance we could give this another shot? The 0.15 release uses Arrow 0.17.1 and boost 1.72 and has been released for a bit now.

rosbo commented 4 years ago

Kicked off a new build with 0.15 and will see if we get any conflicts: https://github.com/Kaggle/docker-python/tree/add-rapids-ai-0.15

rosbo commented 4 years ago

We are in the process of migrating our GPU image to be based on gcr.io/deeplearning-platform-release/tf2-gpu.2-3.

Hitting conflicts again and I am waiting on conda to give me the list of conflicts... Will report hopefully soon.

docker run -it --rm gcr.io/deeplearning-platform-release/tf2-gpu.2-3:latest /bin/bash
$ conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.15 python=3.7 cudatoolkit=10.1
rosbo commented 4 years ago

The conflict resolver did eventually converge after more than 72h and printed a long list of conflict (longer than my bash terminal history limit setting). I will take a look at this more closely once we have migrated to the new base image.

kkraus14 commented 4 years ago

Thanks @rosbo. We're planning on releasing 0.16 in ~2 weeks and it will upgrade Arrow to 1.0.1 and keep boost at 1.72.0 as the main version pinnings. Let me know if there's anything I can help with.

kkraus14 commented 3 years ago

@rosbo quick ping here. 0.16 has been out for a while, any chance we could give this another shot?

rosbo commented 3 years ago

Hi @kkraus14, I will try to give it another shot this week. Thanks for the ping about the new release.

rosbo commented 3 years ago

@kkraus14 Great news!

I was able to successfully install cudf 0.16 on our images (no conflicts). I tried with only cudf first to reduce # of conflicts. I will try next to with cuml.

The original request was about adding cudf & cuml. I see now that Rapids includes several others packages. Should I try installing all of them with conda install rapids or just installed cudf and cuml?

Thank you

kkraus14 commented 3 years ago

I think we should just do cudf and cuml to start and then we can expand from there. conda install rapids will pull in things like cuspatial which will pull in GDAL which could get us quickly back into dependency hell.