astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
20.33k stars 605 forks source link

Workspaces and monorepo support (add sync --all-packages) #6935

Open carderne opened 2 weeks ago

carderne commented 2 weeks ago

I've put a decent amount of effort trying to figure out a workable "monorepo" solution with pip-tools/Rye/etc and now uv. What I mean by a monorepo:

  1. 2+ packages with interdependencies.
  2. The ability to lock dependencies across packages (where not needed, split into multiple workspaces). More sophisticated multi-version handling would be great but out of scope.
  3. Multiple entrypoints. Packages are peers and there is no "root" package.
  4. Probably want to distribute the packages in a Dockerfile or similar.

I'm packaging a few thoughts into this issue as I think they're all related, but happy to split things out if any portions of this are more likely to be worked on than others.

Should uv support this?

I think yes. Pants/Bazel/etc are a big step up in complexity and lose a lot of nice UX. uv is shaping up as the defacto Python tool and I think this is a common pattern for medium-sized teams that are trying to move past multirepo but don't want more sophisticated tooling. If you (uv maintainers) are unconvinced (but convince-able), I'm happy to spend more time doing so!

Issues

1. Multiple packages with single lockfile

Unfortunately, uv v0.4.0 seems to be a step back for this. It's no longer possible to uv sync for the whole workspace (related #6874), and the root project being "virtual" is not really supported. The docs make it clear that uv workspaces aren't (currently) meant for this, but I think that's a mistake. Have separate uv packages isn't a great solution, as you lose the global version locks (which makes housekeeping 10x easier), so you have multiple venvs, multiple pyright/pytest installs/configs etc.

For clarity, I'm talking about the structure below. I think adding a tool.uv.virtual: bool flag (like Rye has) would be a great step. In that case the root is not a package and can't be built.

.
├── pyproject.toml                 # virtual
├── uv.lock
└── packages
    ├── myserver
    │   ├── pyproject.toml         # depends on mylib
    │   └── myserver
    │       └── __init__.py
    └── mylib
        ├── pyproject.toml
        └── mylib
            └── __init__.py

2. Distributing in Dockerfiles etc

This is I think orthogonal to the issue above. (And much less important, as it's possible to work around it with plugins.) Currently, there's no good way to get an efficient (cacheable) Docker build in a uv workspace. You'd like to do something like the Dockerfile below, but you can't (related #6867).

FROM python:3.12.5-slim-bookworm
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

WORKDIR /app
COPY uv.lock pyproject.toml /app/

# NB: doesn't work as the server package isn't there!
RUN uv sync --locked --no-install-project --package=server

COPY packages /app/packages
RUN uv sync --locked --package=server
ENV PATH="/app/.venv/bin:$PATH"

If that gets resolved, there's another issue, but this is very likely to be outside the scope of uv. Just sharing it for context.

My own solution has been to build wheels that include any dependencies so you can just do this:

# uv is nice enough to resolve transitive dependencies of server
uv export --format=requirements-txt --package=server > reqs.txt

Then in Dockerfile:

COPY reqs.txt reqs.txt
RUN uv pip install -r reqs.txt
# add --no-index to prevent internet access to ensure only the
# hash-locked versions in reqs.txt are downloaded
RUN uv pip install server.whl --no-deps --no-index

I've written a tiny Hatch plugin here that injects all the required workspace code into the wheel. This won't work for many use-cases (local dev hot reload) but is one way around the problem of COPYing the entire workspace into the Dockerfile. I don't think there's any solution that solves both together, and at least this way permits efficient Docker builds and simple Dockerfiles. (Note: since uv v0.4.0 the plugin seems to break uv's editable builds, haven't yet looked into why.)

Afoucaul commented 2 weeks ago

To expand on the Docker image, this is what I would want to do:

FROM python:3.12.5-slim-bookworm AS python-builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

# Create a venv at a well-known location so it can be COPY'd later
RUN uv venv /opt/python
# Tell uv to use that venv
ENV UV_PYTHON=/opt/python

WORKDIR /app
COPY uv.lock pyproject.toml /app/
# No need to COPY pyproject.toml of libs - they're all well-specified in uv.lock anyway

# Install the app without all workspace members - ie all 3rd party dependencies 
RUN uv sync --locked --no-install-workspace --package=server

COPY packages /app/packages
# Install 1st party dependencies, but only those that are needed
# Also pass the fictional `--no-editable` flag to actually bundle them into the venv
RUN uv sync --locked --no-editable --package=server

FROM python:3.12.5-slim-bookworm AS runtime

# Copy the venv that has all 3rd party and 1st party dependencies, ready for use
COPY --from=python-builder /opt/python /opt/python
ENV PATH="/opt/python/bin:$PATH"

I can't do that because:

  1. uv sync --locked --no-install-workspace --package=server complains because server isn't there (nor are its dependencies anyway)
    • it seems that uv.lock already has all the information needed to resolve this: it contains workspace members, so uv can know of server, and of its dependencies, without all pyproject.toml files needing to be there
  2. There's no such flag as --no-editable - uv will install workspace members as editable packages, so COPYing the venv in the final stage won't work because the packages pointed at won't be there
    • this would allow to build a complete venv that can be shipped, with all and only the dependencies it needs
  3. uv sync doesn't support targetting a venv (although that's under discussion from what I've gathered)
charliermarsh commented 2 weeks ago

(1) is easy to resolve, would that help?

carderne commented 2 weeks ago

(1) Yes, that would be great! (I'll start working on a patch but I suspect I'll still be noodling by the time you merge yours.)

For (2), I suspect the only generally useful solution would be to encode the package-specific dependency tree in uv.lock (like pnpm-lock.yaml) rather than calculating it on the fly. That might make it harder to dovetail with PEP 751, but from what I understand you're planning to support pylock as an output format that uv won't use internally, so maybe not important.

charliermarsh commented 2 weeks ago

For (2), we're thinking of perhaps a dedicated command like uv bundle that would handle a lot of the defaults that you want for this kind of workflow. But otherwise a --no-editable or similar seems reasonable to me.

charliermarsh commented 2 weeks ago

Lets track (2) in https://github.com/astral-sh/uv/issues/5792.

charliermarsh commented 2 weeks ago

I think adding a tool.uv.virtual: bool flag (like Rye has) would be a great step. In that case the root is not a package and can't be built.

How is this different than tool.uv.package = false?

charliermarsh commented 2 weeks ago

I think that does what you're describing?

charliermarsh commented 2 weeks ago

6943 adds support for --frozen --package.

carderne commented 2 weeks ago

Sorry you're moving too quickly for me!

About (1)

You're right that package=false does what is needed. It allows a very minimal root pyproject.toml that looks like the one below. The only downside is that in order for uv sync to sync the entire workspace, you need to add each package to project.dependencies and to tool.uv.sources and in tool.uv.workspace.members. I should have been more explicit in my first message that what I think is needed here is uv sync --the-entire-workspace. (This is the default behaviour in Rye and was the default in uv<0.4.0.)

Alternatively a more explicit flag in the config like tool.uv.workspace.this-project-is-virtual-so-sync-all-members-by-default: bool.

[project]
name = "monorepo-root"
version = "0"
requires-python = "==3.12"
dependencies = ["mylib", "myserver"]

[tool.uv]
dev-dependencies = []
package = false

[tool.uv.sources]
mylib = { workspace = true }
myserver = { workspace = true }

[tool.uv.workspace]
members = ["packages/mylib", "packages/myserver"]

On (2) the Docker stuff

I don't really understand how #6943 helps but seems sensible anyway. I see three obvious ways (not uv specific) of getting stuff into a Docker image:

  1. Export a package-specific requirements.txt, install those, then COPY in all needed packages.
  2. Same for requirements.txt. Then create a site-packages and COPY that in. I assume this is what the --non-editable is about in #5792.
  3. Same for requirements.txt. Then create sdists/wheels from the packages (the plugin I mentioned).

All of these require a little pre-Docker script to generate the requirements.txt which isn't ideal but fine. Assuming I've understood correctly on (2) above then I'll move any more comments I have to that Issue.

charliermarsh commented 2 weeks ago

For (2), I thought you wanted to do this:

FROM python:3.12.5-slim-bookworm
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

WORKDIR /app
COPY uv.lock pyproject.toml /app/

# NB: doesn't work as the server package isn't there!
RUN uv sync --locked --no-install-project --package=server

COPY packages /app/packages
RUN uv sync --locked --package=server
ENV PATH="/app/.venv/bin:$PATH"

This now works as expected if you use frozen rather than locked.

b-phi commented 2 weeks ago

This is also causing some issues for me with 0.4.0+. Locally sync works fine

> uv sync
Resolved 341 packages in 76ms
Audited 307 packages in 3ms

But when adding --frozen, which we use in CI, uv ignores the workspace members

> uv sync --frozen
Uninstalled 97 packages in 7.57s
...
Audited 210 packages in 0.25ms

The different dependency resolution behavior depending on whether I pass --frozen is unexpected.

charliermarsh commented 2 weeks ago

Does your root pyproject.toml have a [project] section?

b-phi commented 2 weeks ago

No, just a "virtual" workspace, effectively this.

[tool.uv]
dev-dependencies = [
    "...",
]

[tool.uv.workspace]
members = ['libs/*', 'sandbox']
charliermarsh commented 2 weeks ago

I can look into why you're seeing differences (it sounds like a bug!). I'd suggest migrating to a virtual project though, i.e., adding a [project] table (but not a build-system) to your root pyproject.toml. We redesigned those in v0.4.0 and the version above is now considered legacy.

b-phi commented 2 weeks ago

Adding the [project] section as suggested now shows consistent behavior with or without --frozen. I was able to get back to the desired sync behavior by adding the workspace members to the project dependencies and a [tool.uv.sources] section enumerating the workspace members. More verbose, but more consistent. Thanks for the help!

charliermarsh commented 2 weeks ago

Great! Still gonna see if I can track down and fix that bug :)

carderne commented 2 weeks ago

What @b-phi is talking about is exactly what I mentioned in (1) of my comment up above. Basically you have to add each workspace member in three places. Would be great if that could be made unnecessary (in one of the ways I suggested or some other way).

On (2) the Dockerfiles, the command you added helps, but it still doesn't work if there arae dependencies between packages and you haven't yet copied in the files. There's an MRE here. It fails when trying to run the --no-install-project sync because packages/server wants packages/greeter but it's not there. Currently the only way around this (afaict) is to pre-export a requirements.txt and use that.

charliermarsh commented 2 weeks ago

I'm confused on (2). We have --no-install-workspace that does exactly this, right?

carderne commented 2 weeks ago

Oh of course, sorry. So (2) I think is resolved. The remaining stuff about getting the right files into the Dockerfile are not really uv's problem. (Although could be helped by stuff like --non-editable.)

The main point of this issue is (1) but I'm very happy to wait for you to figure out an approach that you're happy with. But I think it would be great to resolve.

charliermarsh commented 2 weeks ago

👍 Part of what I'm hearing here too is that we need more + better documentation for this stuff.

carderne commented 2 weeks ago

Yeah I don’t blame you, it’s moving really fast.

EDIT: adding this here to make it clear to any future travellers why this issue is still open. The question is whether the sync command could have an --all-packages command added (or some similar name).

Afoucaul commented 2 weeks ago

👍 Part of what I'm hearing here too is that we need more + better documentation for this stuff.

I'm probably biased, but it seems to me that a monorepo with possibly interdependent libs, and independently buildable (most of the time into Docker images) apps is a common pattern - at least it's what workspaces promote. With that in mind, it would indeed be great to have documentation about how Astral intends us to use uv to manage such a repo and such builds. Until now, it feels like I'm hacking my way to a satisfying set-up, although uv maintainers obviously have a "right way" in mind.

That said, I must say I'm having an amazing experience with uv (and ruff, and Astral in general), and that I'll advocate to use it in all the projects I maintain!

carderne commented 2 weeks ago

@Afoucaul Is there anything else you think is missing apart from a sync --all-packages (if you agree that is needed) and improved monorepo/workspace docs?

PhilipVinc commented 2 weeks ago

Is it possible for a package, virtual project or workspace to depend on another workspace, or on a package in a workspace?

I'm thinking of the case common in data science where we have a set of packages developed in a workspace (let's say numpy and scipy are the packages developed in WRKSPC) and we don't really publish them to a repository or anywhere.

At some point I want to start a data science project, so I will create a virtual package with some scripts that require scipy, which in turn depends on the workspace version of numpy. How can I express this dependency?

b-phi commented 2 weeks ago

@Afoucaul Is there anything else you think is missing apart from a sync --all-packages (if you agree that is needed) and improved monorepo/workspace docs?

Jumping in here, managing multiple environments would be very helpful. In our repo, some sub-packages have heavy ML dependencies, others have linux-only dependencies. Ideally I would be able to manage multiple environments for different use cases, e.g. lightweight venv on OSX host, a linux venv that I use via docker, a heavier ML env etc.

Afoucaul commented 2 weeks ago

@Afoucaul Is there anything else you think is missing apart from a sync --all-packages (if you agree that is needed) and improved monorepo/workspace docs?

Jumping in here, managing multiple environments would be very helpful. In our repo, some sub-packages have heavy ML dependencies, others have linux-only dependencies. Ideally I would be able to manage multiple environments for different use cases, e.g. lightweight venv on OSX host, a linux venv that I use via docker, a heavier ML env etc.

I've managed to do that by defining apps as packages (that you target with --package), and extras. For instance, I've created an ai package that needs tensorflow, which I added with uv add --package ai --optional ml extra. That way, a package that needs ai but never actually reaches the part where tensorflow is imported, can depend on it via uv add --package consumer ai, whereas a package that actually needs that would declare it via uv add --package consumer ai[ml] (note ai vs ai[ml]). That's actually very useful to install a venv on an ARM macbook for a project that needs tensorflow somewhere - you run uv sync without --extra ml, so you don't end up with tensorflow, but everything else - good enough for developing. Then in your actual runtime, you do uv sync --all-extras (assuming all extras are prod, all dev deps are declared as such) to get everything you need.

If you need very specific environments that are orthogonal to apps, you could create one with uv init environments/my-env, add deps via uv add --package my-env ai, and then uv sync --package my-env.

Afoucaul commented 2 weeks ago

@Afoucaul Is there anything else you think is missing apart from a sync --all-packages (if you agree that is needed)

I've resolved that point by adding all local packages to the root package (uv add foo where foo is a workspace member), but I do agree it's error prone and requires an extra command each time you create a new package.

Afoucaul commented 2 weeks ago

Is it possible for a package, virtual project or workspace to depend on another workspace, or on a package in a workspace?

I'm thinking of the case common in data science where we have a set of packages developed in a workspace (let's say numpy and scipy are the packages developed in WRKSPC) and we don't really publish them to a repository or anywhere.

At some point I want to start a data science project, so I will create a virtual package with some scripts that require scipy, which in turn depends on the workspace version of numpy. How can I express this dependency?

There's only one lockfile, so if at the root of your monorepo you run uv init projects/testing-around-some-stuff then uv add --package testing-around-some-stuff scipy you'll end up with the workspace's scipy. There's some caveats though, if you try to use in testing-around-some-stuff a different version of some package that's already specified in the uv.lock: either you'd be unable to do so because of the set of constraints, or you could and that would update that package's version for the whole workspace - not ideal either. I'm not sure how one would create a project in a workspace and specify that it should always respect the workspace's requirements and never change them.

rokos-angus commented 6 days ago

One thing preventing us from switching over our monorepo to uv is that its really hard to tell in CI which projects in a workspace actually changed when uv lock changes.

We have many apps deployed from a single monorepo and don't want to have to build docker images for all of them every time uv.lock changes (e.g. someone adding a new project or library to the workspace)

carderne commented 6 days ago

@rokos-angus one way around that would be to have a git-hook/CI step/something that runs uv export ... for each package and you diff those files to see what needs to be built.

vlad-ivanov-name commented 6 days ago

where i work we use http://github.com/josh-project/josh to figure out what changed (disclaimer: i'm a contributor in that project)

with that said I think CI is a separate problem. when it comes to python, here's what i've been able to narrow down the requirements to:

1) it should be possible to make packages depend on packages with relative paths 2) at the same time, it should be possible to build packages for a scenario where they are eventually uploaded to a package registry, so relative paths won't work there 3) every package has multiple sets of dependencies ("environments"). this can be either due to feature switches or for example because CI needs more deps to run tests

how we solved is for us is a custom script connected via https://github.com/recogni/setuptools-monorepo that resolves those dependencies in a desired way depending on context (for example either to file:// path or to a name in the registry). this way we can have monorepo but we can also publish wheels from this monorepo and ship them. but i would really like to see a more "native" solution

agree to the point of having a shared lockfile, this is often a pain point

carderne commented 6 days ago

@vlad-ivanov-name I'm slowly working on something similar at https://github.com/carderne/una albeit uv-specific and Hatch not setuptools. It figures out where to find files using uv's { workspace = true } config rather than the URL.

I haven't really thought about your point (2). Nor much for (3), but my assumption is that for testing you'd use uv sync and for deployment use the plugin.

JasperHG90 commented 4 days ago

I've compiled an example that works for my purposes that might help some folks looking for a monorepo setup using uv.

carderne commented 3 days ago

@JasperHG90 that link is 404 for me.

gwdekker commented 3 days ago

https://github.com/DavidVujic/python-polylith-example-uv is another example which I think supports this or similar use cases. @DavidVujic

DavidVujic commented 3 days ago

https://github.com/DavidVujic/python-polylith-example-uv is another example which I think supports this or similar use cases.

Thanks for the mention!

Yes, if I have understood the things talked about in this issue correctly I think that Polylith in combination with uv might be helpful. It's an architecture for monorepos originating from the Clojure community. There's tooling support, and I'm the maintainer of the Python tooling. It works well with uv and here's the docs if you want to know more.

JuanoD commented 11 hours ago

I made https://github.com/JuanoD/uv-mono as an example repo. Feel free to correct me if something is wrong

JasperHG90 commented 9 hours ago

@JasperHG90 that link is 404 for me.

Sorry was ill these past days 🦠. Is fixed now! Thanks for the heads up.