Quansight-Labs / czi-scientific-python-mgmt

🐍 Top level project management for Scientific-Python CZI grant at Labs
https://github.com/orgs/Quansight-Labs/projects/11?query=is%3Aopen+sort%3Aupdated-desc
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

CI improvements - Pyodide/WASM #18

Open trallard opened 1 year ago

trallard commented 1 year ago

📝 Summary

Expand the CI support for cross-compiling to Pyodide/WebAssembly to at least five projects.

🚀 Tasks / Deliverables

TBD

📅 Estimated completion

24 months milestone

📋 Additional information

Status

[!TIP] This table has been brought over from https://github.com/pyodide/pyodide/issues/3049#issuecomment-2142352968

Package name Out-of-tree WASM builds Anaconda.org scheduled uploads
NumPy https://github.com/numpy/numpy/pull/25894, https://github.com/numpy/numpy/pull/26564, https://github.com/numpy/numpy/pull/26570 https://github.com/numpy/numpy/pull/26134, https://github.com/numpy/numpy/pull/27353
PyWavelets https://github.com/PyWavelets/pywt/pull/701, https://github.com/PyWavelets/pywt/pull/744 https://github.com/PyWavelets/pywt/pull/710
pandas https://github.com/pandas-dev/pandas/pull/57896 https://github.com/pandas-dev/pandas/pull/58647
awkward and awkward-cpp https://github.com/scikit-hep/awkward/pull/2062 (not by me) Planned
scikit-learn ✅ (improvement via https://github.com/scikit-learn/scikit-learn/pull/29791 in progress) Planned
scikit-image ✅ (setup: https://github.com/scikit-image/scikit-image/pull/7350, improvement: https://github.com/scikit-image/scikit-image/pull/7525) In progress at https://github.com/scikit-image/scikit-image/pull/7440
statsmodels ✅ (setup: https://github.com/statsmodels/statsmodels/pull/9270, improvement: https://github.com/statsmodels/statsmodels/pull/9343) https://github.com/MacPython/statsmodels-wheels/pull/161
Zarr https://github.com/zarr-developers/zarr-python/pull/1903, needs https://github.com/pyodide/pyodide/pull/4817 to be released Planned
numcodecs https://github.com/zarr-developers/numcodecs/pull/529, ready for review Planned
SciPy Planned Planned
SymPy https://github.com/sympy/sympy/pull/27183 https://github.com/sympy/sympy/pull/27186 (implemented by a maintainer), python-flint (dependency of SymPy) WASM builds left – discussion underway in https://github.com/flintlib/python-flint/issues/234
Matplotlib https://github.com/matplotlib/matplotlib/issues/27870, being tracked in https://github.com/matplotlib/matplotlib/pull/29093 (not implemented by me) Planned in https://github.com/matplotlib/matplotlib/pull/29093
h5py and libhdf5 https://github.com/h5py/h5py/issues/2397 Planned
PyTables Planned Planned
rgommers commented 6 months ago

Aiming to meet this deliverable within the next 2-4 weeks. Several projects have support (NumPy, PyWavelets, Pandas, scikit-learn), others are in the pipeline (scikit-image, Zarr, Awkward, hopefully also Matplotlib at least). A few others started but on hold due to higher priority items.

Meeting the deliverable won't be the end of it, but we should switch to deploying working interactive docs for a few more projects first, to accelerate the feedback cycle.

rgommers commented 3 months ago

We're getting there! Thanks for adding the detailed issue tracker @agriyakhetarpal

agriyakhetarpal commented 3 months ago

Pyodide's alpha releases for 0.27 are now up, @rgommers – should we now look at https://github.com/zarr-developers/zarr-python/pull/1903 again or wait a bit until we have the stable release a short while after?

rgommers commented 3 months ago

should we now look at zarr-developers/zarr-python#1903 again or wait a bit until we have the stable release a short while after?

The action there is to make async tests for Zarr v3 work, which doesn't depend directly on that PR but (if I understand correctly) is infra work within Pyodide. If there's nothing higher on your prio list, trying to understand that in more detail and moving it forward would be useful I think.

agriyakhetarpal commented 1 month ago

Initially, this was slightly difficult back when I started with the Pyodide ecosystem, but we've got statsmodels's support backported via https://github.com/statsmodels/statsmodels/pull/9365 so that it could get fast-tracked for inclusion in a new v0.14.4 release with no other changes today :) Both last month and this month involved and will involve a bit of travel and conferences respectively, so we should be able to close the "official" target of five projects down in early November (including Zarr, from the above discussion).

Here is a bit of extra context for any other potential readers besides Ralf and me:

Two questions on the above:

  1. Is it worth spending time occasionally backporting SciPy's upstream rewrites in Pyodide downstream and un-skipping WASM tests as a result? I feel the answer should be "yes", since it helps us know reasonably well how well SciPy works and helps reduce turnaround time for in-tree updates (which come after with SciPy's PyPI releases – twice a year). Here, I don't have a set target in principle, but "occasionally" could refer to "anything more than twice a year". They would be similar to how id_dist was Cythonised (patched in Pyodide now) and how LBFGSB was rewritten (not yet patched). One way to evaluate which rewrites to backport would be to see which and how many tests a particular rewrite allows us to un-skip, since rewrites would be included in the next SciPy release anyway.
  2. If the emscripten-forge ecosystem is able to build updated versions for libhdf5 and h5py sometime down the line (they have something in progress right now), we can look into including the Emscripten-compiled libhdf5.a in the cross-build environment to make it available for out-of-tree linkage, similar to how NumPy includes libnpymath.a and the relevant header files in xbuildenv/site-packages-extras/? And when we unvendor packages' (and libraries') recipes, their updates will become faster because they will get decoupled with Pyodide.

Decoupling recipes in the medium term would make us have to bother a bit less with the first question, too: the rewrites get included in subsequent SciPy releases, which are not in sync with the Pyodide releases, since the timelines have always been and would continue to be different, so some PR that is going to benefit, say, SciPy v1.16 users would be nice to backport to SciPy v1.15 in Pyodide if Pyodide has an upcoming release (i.e., before SciPy's v1.16's upcoming release). That said, there are other reasons besides the difference in release timelines for why the act of porting these rewrites is useful, I believe, which are covered in the question.

agriyakhetarpal commented 1 month ago

SymPy added as a potential target as discussed on 11/10/2024.

rgommers commented 1 month ago

we should be able to close the "official" target of five projects down in early November (including Zarr, from the above discussion).

Great to see that!

Is it worth spending time occasionally backporting SciPy's upstream rewrites in Pyodide downstream and un-skipping WASM tests as a result? I feel the answer should be "yes",

I'd say probably not, since this is mostly extra work (and not just an hour or less) that is anyway going to land in Pyodide. I'd prefer to see time spent on more structural improvements.

2. we can look into including the Emscripten-compiled libhdf5.a in the cross-build environment to make it available for out-of-tree linkage, similar to how NumPy includes libnpymath.a and the relevant header files in xbuildenv/site-packages-extras/?

I really want to get rid of libnpymath.a - shared libraries that cross package boundaries are a really bad idea in Python wheels, and we've had a lot of trouble with it over the years. So my first inclination here is to say that this probably isn't a step in the right direction.

rgommers commented 2 weeks ago

It seems like we've met the deliverables here. There's a few more PRs that look close (e.g. scikit-image wheels PR looks like the code is written, it's just waiting on Pyodide 0.27) and more improvements are always nice, but for the record let's declare victory here:) Issue can stay open for tracking purposes.

agriyakhetarpal commented 2 weeks ago

more improvements are always nice, but for the record let's declare victory here:)

Yay! Here's to victory! Yes, I'd keep the issue open, too, since there are a few niceties that I'd like to clean up with, such as pyodide/pyodide-actions#12, which is a nice-to-have but not urgent at all.