Discussion: uv workspaces in a monorepo - thoughts on change-only testing

BrendanJM commented 2 months ago

Hi team, the new release of uv looks great, really looking forward to diving into all the features and new docs.

One area that I have some questions around is using uv for monorepos of multiple python packages, services, etc. I see that workspaces seem to offer a general purpose framework for structuring monorepos, and their package relationships.

This all looks like good stuff, but there are a few open questions I have. While it seems like this provides some structure for a monorepo, there are a few features that might be desirable for monorepo tooling. One example that is top of mind is tooling to help identify what code has changed and being able to only run tests affected by this code. At scale the duration of extensive test pipelines in CI can become a very real problem for developer productivity, and unfortunately is a concern that is part and parcel to packaging solutions for large projects.

Identifying code changes and only running affected tests is a significant feature for build systems such as bazel and pants, as well as tooling like pytest-testmon. I think the way that uv presents structure to managing related projects is great, so my question is would some feature along these lines be something we see uv managing in the future? And if not, are there any early thoughts on how it could integrate with other tooling or build systems to enable this?

It'd be great if we can establish some opinionated guidance here that would work in the general case.

zanieb commented 2 months ago

This sounds cool, but I think we have a ways to go before we could prioritize looking into it.

goutamvenkatCatalan commented 2 months ago

+1 on monorepo tooling. Especially with regards to having dependency caching per file and being able to create a binary for a given target (essentially a file graph). Curious what needs to happen to help accomplish this goal?

BrendanJM commented 2 months ago

Just brainstorming a bit, one additional question on the topic - are uv workspaces arbitrarily nestable/composable? E.g. I am thinking of a single monorepo structure where we have one or more groups of several packages that should be in a workspace together (shared deps), maybe several packages that are not in any workspace. So something like this:

root (workspace_0?)
  --workspace_1
       -- service_a
       -- package_b
       -- package_c
  --workspace_2
       -- package_d
       -- package_e
  --packages
       -- package_f
       -- package_g

The idea behind this structure being it provides an incremental path to managing packages and services together in logical groups, while also allowing packages to be developed in isolation if needed. So workspace_1 would share dependencies, but be independent from workspace_2, and anything under packages would be able to be referenced locally by path without needing to conform to requirements in either well defined workspace.

BrendanJM commented 2 weeks ago

Another observation: when using uv workspaces in a monorepo, with the current behavior there is a singular lock file at the top of the repo. The idea here makes sense, since it means one set of resolved dependencies. But this presents a few issues in practice.

In the event we have one or more application services, generally we would want these services to be deployed with only the dependencies they need to function. Since a monorepo can accrue a substantial superset of these dependencies across all libraries/services, it becomes difficult to build and deploy services individually with only the dependencies they need.

A related issue is that the lock file remains one or more levels above the service's pyproject.toml (and the rest of its project code). Because docker can't bring in dependencies from parent directories, any docker build commands run at the level of the service won't successfully locate the uv.lock file. This can be worked around by running the command from a level up or using context along with e.g. docker compose, which is mostly fine, but it does break the idea of monorepo components being essentially fully isolatable to their own subdirectories.

One solution to both of these would be to include subset uv.lock files at the level of workspace components in addition to the top level uv.lock.

NixBiks commented 2 weeks ago

In the event we have one or more application services, generally we would want these services to be deployed with only the dependencies they need to function. Since a monorepo can accrue a substantial superset of these dependencies across all libraries/services, it becomes difficult to build and deploy services individually with only the dependencies they need.

That sounds like a design issue of your monorepo. I tend to design my libraries so they each depend on very few dependencies. Then you can have higher level services that pick and choose whatever libraries you need. This way you service only install dependencies that are actually needed (you designed it suboptimal otherwise). Instead of many small libraries you can also have libraries with optional dependencies for certain features. Again; just a design choice.

BrendanJM commented 1 week ago

@NixBiks Maybe I could have clarified better, but I believe we're talking about the same thing here. The goal is what you are talking about, and the context is how well-suited uv's workspaces are to that end. This deploy-only-the-dependencies-each-service-needs pattern isn't well supported in the current workspace implementation, since uv only resolves a single superset lock file at the virtual root rather than individual lock files at the service level, even if the services specify only their minimum dependencies in their own pyproject.toml files.

That said, it's not difficult to work around this, as you can just exclude your services (and libraries) that you want distinct lock files from each workspace (and do a little scripting to manage locking and syncing these for many projects). But then one would wonder why use the workspace feature at all. I do think there's promise in workspaces FWIW, and looking forward to seeing them evolve to better support monorepo workflows.

NixBiks commented 1 week ago

@BrendanJM isn't it just uv sync --package my-service you are looking for?

BrendanJM commented 1 week ago

It gets you part of the way there - if I am interpreting this correctly, it seems like you could definitely use it to perform the install to just the service dependencies. But it does still look like it would still be missing a separate lock file for the service itself, which would be very helpful for e.g. cache invalidation when building a docker layer to handle this sync step.

uv lock does not presently accept a --package argument, which I think might be the missing piece for this.

zanieb commented 1 week ago

I don't quite follow why you need a separate lockfile for just a single member? The point of using a workspace is that it ensures that you're using consistent versions across all of the members.

BrendanJM commented 1 week ago

While the primary purpose of lock files is to resolve a consistent set of versions across the members, it's pretty standard practice to use lock/requirement files for cache management in docker.

Let's say I have a larger monorepo with 100+ packages and services. We're using uv's workspaces functionality so that we have consistent versions across all members, all resolved to a single lockfile - which is great! But not all members require all dependencies. So while uv sync --package my-service could be useful to just populate the service dependencies, my docker build pipeline needs to know when the dependencies have changed to build this layer and subsequent ones.

In a sufficiently large monorepo, there could be fairly constant stream of updates to the top level lockfile (unrelated to my-service or it's dependencies), which would make it difficult to meaningfully use this in the docker file to force cache invalidation in the service's docker build, since it would basically be firing false positives with every unrelated update.

This is only mild inconvenience at smaller scales, but at a larger scale, constant false-positives to invalidation in the docker pipeline can become significant. While using a cache mount can definitely help docker benefit from a more persistent uv cache, downstream layers would still be affected and can be expensive to rebuild.

Edit to add: If having subset uv.lock files in each workspace member feels confusing or prone to misuse (and I mean subset as in a strict subset downstream of the root lock file, not an independent lock), an alternate option could be a mechanism for generating e.g. a lock.hash file that was simply a hash of only the dependencies for that specific member.

aberres commented 1 week ago

@BrendanJM We have a similar problem and are (for now) using something hacky like this:

# Create a stable minimal requirements.txt
uv export --frozen --directory $SUBDIR/ -o requirements.txt

# Drop references to local packages
sed -i '/-e \.\/subdir\//d' requirements.txt

# In the Dockerfile, this creates a stable cache layer
RUN uv pip sync requirements.txt --no-cache --compile-bytecode

# And then install the packages
RUN uv pip install --no-deps package_a/ package_b/

rokos-angus commented 1 week ago

@BrendanJM This is a fantastic writeup that cuts to the heart of why using UV in monorepo CI is currently painful.

One point I would add: it's particularly painful that adding new items to the workspace changes uv.lock, even if they don't require any new dependencies. We have quite a large monorepo with most of the code is using a small number of standard 3rd party dependencies.

Adding new items to the workspace is common, but adding third party dependencies is actually fairly rare at this point. If the former didn't update uv.lock then it would make our CI much simpler and users could easily add new services and libraries to the workspace leveraging a subset of the existing workspace dependencies

zanieb commented 1 week ago

I sort of think a uv export --package child --frozen --no-emit-workspace workflow may be most appropriate for your Docker use-case. You could also pipe that into a hash utility.

Thanks for all the additional details. We'll think on this.

astral-sh / uv

Discussion: uv workspaces in a monorepo - thoughts on change-only testing #6356