databricks / sbt-spark-package

Sbt plugin for Spark packages
Apache License 2.0
151 stars 32 forks source link

Building packages that support Python 3 #26

Closed nchammas closed 6 years ago

nchammas commented 8 years ago

I'm interested in helping resolve https://github.com/graphframes/graphframes/issues/85, which would make GraphFrames compatible with Python 3. It appears that resolving that issue requires some changes here to support building packages that support Python 3.

Is anybody working on that already?

It looks like the compiled Python artifacts are being generated here using python -m compileall. If we want to support building Spark packages that support both Python 2 and 3, we should perhaps be building and shipping wheels instead of compiled Python bytecode.

If I understood the situation correctly that we currently can't build Spark packages that support Python 3, I would consider this a critical deficiency since 1) Spark itself supports Python 3, and 2) Python 3 adoption is reaching a tipping point (at least in my circles) where most new Python projects are being written in Python 3.

I am happy to take this on with some guidance from a maintainer, or help said maintainer do the work themselves. This issue is important to me and I am ready to make time to work on it.

nchammas commented 8 years ago

@brkyvz - Looking at the contribution graph, it appears you are the key person on this project.

Can you chime in on the most practical way to deliver Python 3 support in Spark packages? Like I said, I can put in the time to help here, but I don't want to dive in without some indicator of support (even provisional support) from a committer.

JoshRosen commented 8 years ago

It looks like #29, by @mariusvniekerk, will partially address this issue by preventing .pyc files from being pulled into the distribution.

mariusvniekerk commented 8 years ago

So basically a bunch of the older spark-packages that were made have pyc files in them. Newer ones are safe (provided of course that the python parts of them are py2/3 compatible)

nchammas commented 8 years ago

@brkyvz - Is this still an issue, given your comments on #29? Where exactly do we stand today with regards to Python 3 support?

nchammas commented 8 years ago

Pinging @brkyvz again for an authoritative word on Python 3 support in Spark packages. It looks like they now work (as of GraphFrames 0.3, at least), but I'm not sure if there was an official announcement to that effect.

cc @mengxr @thunterdb

mariusvniekerk commented 8 years ago

So spark packages have supported python 3 for quite a while. It's just that many of the python parts of those packages were not py2/3 compatible.

nchammas commented 6 years ago

Closing this as I believe nothing needs to be done here. GraphFrames, the project that motivated me to file this issue, runs fine on Python 3.