gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
238 stars 49 forks source link

Possibly switch back to Poetry? #177

Open gjoseph92 opened 1 year ago

gjoseph92 commented 1 year ago

I just switched to PDM, because Poetry was taking multiple hours to lock dependencies (due to Coiled being a dependency with way too many transitive deps, and https://github.com/python-poetry/poetry/issues/5121).

Testing with Poetry 1.2.0 though, the situation doesn't seem as bad.

Poetry is still 2x slower than PDM—170 seconds instead of 42. But 170s is way, way better than 3h. It's within the range of tolerability. And since Poetry is more widely used and mature than PDM, and a bit more straightforward to work with in virtualenvs, I'm tempted to stick with it for now, even with the longer lock times. It still doesn't have https://github.com/python-poetry/poetry/issues/697, which could end up being critical at some point, but so far hasn't been too much of an issue.

Probably won't take any action on this for now, just noting this in case other compelling reasons come up one way or another.

gabe dev/stackstac ‹fc326d0› » time pdm add -dG docs sphinx-paramlinks
Adding packages to docs dev-dependencies: sphinx-paramlinks
Virtualenv /Users/gabe/dev/stackstac/.venv is reused.
🔒 Lock successful
Changes are written to pdm.lock.
Changes are written to pyproject.toml.
All packages are synced to date, nothing to do.

🎉 All complete!

pdm add -dG docs sphinx-paramlinks  42.10s user 1.82s system 105% cpu 41.601 total
gabe dev/stackstac ‹fc326d0*› » 
gabe dev/stackstac ‹fc326d0*› » 
gabe dev/stackstac ‹c67944b› » time poetry add sphinx-paramlinks
Using version ^0.5.4 for sphinx-paramlinks

Updating dependencies
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/2f/be/7d6e073a3eb740ebeba43a69f5de2b141fea50b801e24e0ae024ac94d4ac/matplotlib-3.5.2.tar.gz  28% (2
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/2f/be/7d6e073a3eb740ebeba43a69f5de2b141fea50b801e24e0ae024ac94d4ac/matplotlib-3.5.2.tar.gz  58% (2
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/2f/be/7d6e073a3eb740ebeba43a69f5de2b141fea50b801e24e0ae024ac94d4ac/matplotlib-3.5.2.tar.gz  88% (2
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/03/c6/14a17e10813b8db20d1e800ff9a3a898e65d25f2b0e9d6a94616f1e3362c/numpy-1.23.0.tar.gz  28% (42.5s
Resolving dependencies... (76.1s)

Writing lock file

Package operations: 1 install, 49 updates, 0 removals

  • Updating attrs (22.1.0 -> 21.4.0)
  • Updating fastjsonschema (2.16.1 -> 2.15.3)
  • Updating jsonschema (4.15.0 -> 4.6.1)
  • Updating jupyter-core (4.11.1 -> 4.10.0)
  • Updating pyzmq (23.2.1 -> 23.2.0)
  • Updating tornado (6.1 -> 6.2)
  • Updating mistune (2.0.4 -> 0.8.4)
  • Updating nbclient (0.6.7 -> 0.6.6)
  • Updating pygments (2.13.0 -> 2.12.0)
  • Updating sniffio (1.3.0 -> 1.2.0)
  • Updating matplotlib-inline (0.1.6 -> 0.1.3)
  • Updating nbconvert (7.0.0 -> 6.5.0)
  • Updating prompt-toolkit (3.0.31 -> 3.0.30)
  • Updating websocket-client (1.4.1 -> 1.3.3)
  • Updating charset-normalizer (2.1.1 -> 2.1.0)
  • Updating debugpy (1.6.3 -> 1.6.0)
  • Updating frozenlist (1.3.1 -> 1.3.0)
  • Updating psutil (5.9.2 -> 5.9.1)
  • Updating pytz (2022.2.1 -> 2022.1)
  • Updating urllib3 (1.26.12 -> 1.26.9)
  • Updating zipp (3.8.1 -> 3.8.0)
  • Updating ipykernel (6.15.2 -> 6.15.0)
  • Updating json5 (0.9.10 -> 0.9.8)
  • Updating toolz (0.12.0 -> 0.11.2)
  • Updating yarl (1.8.1 -> 1.7.2)
  • Updating cloudpickle (2.2.0 -> 2.1.0)
  • Updating docutils (0.19 -> 0.18.1)
  • Updating fsspec (2022.8.2 -> 2022.5.0)
  • Updating jupyterlab-server (2.15.1 -> 2.15.0)
  • Updating nbclassic (0.4.3 -> 0.4.2)
  • Updating numpy (1.23.2 -> 1.23.0)
  • Updating partd (1.3.0 -> 1.2.0)
  • Updating dask (2022.9.0 -> 2022.6.1)
  • Updating jupyterlab (3.4.6 -> 3.4.3)
  • Updating pystac (1.6.1 -> 1.4.0)
  • Updating sphinx (5.1.1 -> 5.0.2)
  • Updating distributed (2022.9.0 -> 2022.6.1)
  • Updating exceptiongroup (1.0.0rc9 -> 1.0.0rc8)
  • Updating keyring (23.9.1 -> 23.6.0)
  • Updating pathspec (0.10.1 -> 0.9.0)
  • Updating py-spy (0.3.13 -> 0.3.12)
  • Updating readme-renderer (37.1 -> 35.0)
  • Installing tqdm (4.64.0)
  • Updating viztracer (0.15.4 -> 0.12.3)
  • Updating xarray (2022.6.0 -> 2022.3.0)
  • Updating black (22.8.0 -> 22.6.0)
  • Updating graphviz (0.20.1 -> 0.16)
  • Updating hypothesis (6.54.5 -> 6.49.1)
  • Updating rasterio (1.3.2 -> 1.3.0)
  • Updating twine (4.0.1 -> 3.8.0)
poetry add sphinx-paramlinks  170.55s user 25.80s system 160% cpu 2:02.03 total
rbavery commented 1 year ago

Should hatch be considered? it doesn't yet have lock file support but is under the pypa org and seems to have momentum as PYPA's preferred packaging tool.

gjoseph92 commented 1 year ago

I've been really curious about hatch, but

  1. No lockfiles
  2. No CLI for adding/removing dependencies: you can easily pip install something in an environment, and forget to add it in tool.hatch.envs.<name>. AFAIU, hatch doesn't have something like poetry's --remove-untracked, so you might not realize you'd messed up until you got to CI.

Lockfiles to me are the core feature though. I never, ever want to end up with different versions of things installed in different environments (my machine vs another dev's machine vs GitHub actions vs readthedocs vs Binder vs ...).

ofek commented 1 year ago

Hey, creator of Hatch here! Out of curiosity, why does this library need lock files?

gjoseph92 commented 1 year ago

@ofek good question.

Currently, this project uses lockfiles to reproduce the environment:

  1. on readthedocs
  2. in binder
  3. in a Coiled software environment
  4. (whenever I get around to it) on GitHub actions
  5. on anyone's machine who contributes

Technically, we actually just reproduce subsets of the environment in different places via dependency groups (docs deps aren't installed on binder, etc.)


Broadly, I think there are two related but separate problems:

  1. Specifying the range of dependencies of a library for when other people install it
  2. Pinning the environment of a project for when it needs to be reproduced in other places (CI, contributors' machines)

I wouldn't mind having two separate tools for these separate problems, if the ergonomics were as easy as Poetry/PDM/any modern CLI-based locking package manager. To me, hatch just focuses on problem 1; I don't know of a tool that focuses solely on problem 2 (besides pip freeze I guess, which does't meet the ergonomics requirement). Because these are pretty tightly related (if I change the rasterio dependency in stackstac, I also want to change the rasterio pinned in my environment), most tools (poetry/pdm/npm/cargo/etc.) seem to solve them together.

But I care a lot about problem 2—quite a bit more than I care about problem 1, actually. I have been bitten so many times by un-pinned dependencies changing, and a new release on PyPi of some transitive dep breaking builds. To me, having a lightweight, declarative process to ensure a consistent Python environment everywhere is key to a modern development experience, where you're needing to re-create that environment all the time in many places in an automated way.