conda-forge / pyspark-feedstock

A conda-smithy repository for pyspark.
BSD 3-Clause "New" or "Revised" License
4 stars 11 forks source link

pyspark 3.2 #31

Closed h-vetinari closed 3 years ago

h-vetinari commented 3 years ago

Upstream is blocked https://github.com/apache/spark-website/pull/361 on PyPI, which is why the update bot didn't find the tag yet either. Aside from regularly just trying to build this for conda-forge, it would be nice to have a "python deployment story" for spark for their release announcement, so 🤞 this works out quickly.

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

h-vetinari commented 3 years ago

Ah, I see, we were just repackaging the upstream jars (which don't exist yet due to the issue noted in the OP). Should we build this from source @conda-forge/pyspark?

CC @conda-forge/core

h-vetinari commented 3 years ago

OK, it looks like this is running into StackOverflows no matter what. 😑

dbast commented 3 years ago

@h-vetinari Instead of pypi we can also switch back to the apache-mirror tarballs, see e.g. this old commit https://github.com/conda-forge/pyspark-feedstock/commit/8cc5d21df1450e2369904c86d06be4f6d622c2b2#diff-f3725a55bf339595bf865fec73bda8ac99f283b0810c205442021f29c06eea9aL11

h-vetinari commented 3 years ago

Thanks for the tip @dbast! I ended up using the official tarball from apache (also mentioned here). Still not enthused about the binary repackaging, but it's at least as good as what we had previously.

I'm going to let upstream know that this is more or less ready (might be reflected in the release announcements since PyPI not available yet), it would IMO be a good chance for conda-forge to present itself to a community that's not necessarily familiar with it. 🙃

PS. pip check is a bit overly strict about the py4j dependency. I'll try to get this fixed in ~https://github.com/conda-forge/py4j-feedstock/pull/20~ https://github.com/conda-forge/py4j-feedstock/pull/21, but IMO this check could also just be skipped.

gengliangwang commented 3 years ago

@h-vetinari Thanks for the work! FYI the file size issue is resolved and PySpark 3.2.0 is available on PyPI now: https://pypi.org/project/pyspark/3.2.0/ Do we still need this one?

h-vetinari commented 3 years ago

FYI the file size issue is resolved and PySpark 3.2.0 is available on PyPI now: https://pypi.org/project/pyspark/3.2.0/

Good to hear!

Do we still need this one?

conda(-forge) & PyPI use different binaries & distribution channels, so yes, for conda it is still needed.

h-vetinari commented 3 years ago

https://github.com/conda-forge/py4j-feedstock/pull/21 has been merged now, so once that package become available in about ~1h, this PR should be ready. I'm not a position to merge it myself, but I've already pinged the people who can.

h-vetinari commented 3 years ago

@conda-forge/pyspark, this should be ready 🙃