Open BrendanJM opened 2 months ago
This sounds cool, but I think we have a ways to go before we could prioritize looking into it.
+1 on monorepo tooling. Especially with regards to having dependency caching per file and being able to create a binary for a given target (essentially a file graph). Curious what needs to happen to help accomplish this goal?
Just brainstorming a bit, one additional question on the topic - are uv workspaces arbitrarily nestable/composable? E.g. I am thinking of a single monorepo structure where we have one or more groups of several packages that should be in a workspace together (shared deps), maybe several packages that are not in any workspace. So something like this:
root (workspace_0?)
--workspace_1
-- service_a
-- package_b
-- package_c
--workspace_2
-- package_d
-- package_e
--packages
-- package_f
-- package_g
The idea behind this structure being it provides an incremental path to managing packages and services together in logical groups, while also allowing packages to be developed in isolation if needed. So workspace_1 would share dependencies, but be independent from workspace_2, and anything under packages would be able to be referenced locally by path without needing to conform to requirements in either well defined workspace.
Another observation: when using uv workspaces in a monorepo, with the current behavior there is a singular lock file at the top of the repo. The idea here makes sense, since it means one set of resolved dependencies. But this presents a few issues in practice.
In the event we have one or more application services, generally we would want these services to be deployed with only the dependencies they need to function. Since a monorepo can accrue a substantial superset of these dependencies across all libraries/services, it becomes difficult to build and deploy services individually with only the dependencies they need.
A related issue is that the lock file remains one or more levels above the service's pyproject.toml (and the rest of its project code). Because docker can't bring in dependencies from parent directories, any docker build commands run at the level of the service won't successfully locate the uv.lock file. This can be worked around by running the command from a level up or using context along with e.g. docker compose, which is mostly fine, but it does break the idea of monorepo components being essentially fully isolatable to their own subdirectories.
One solution to both of these would be to include subset uv.lock files at the level of workspace components in addition to the top level uv.lock.
In the event we have one or more application services, generally we would want these services to be deployed with only the dependencies they need to function. Since a monorepo can accrue a substantial superset of these dependencies across all libraries/services, it becomes difficult to build and deploy services individually with only the dependencies they need.
That sounds like a design issue of your monorepo. I tend to design my libraries so they each depend on very few dependencies. Then you can have higher level services that pick and choose whatever libraries you need. This way you service only install dependencies that are actually needed (you designed it suboptimal otherwise). Instead of many small libraries you can also have libraries with optional dependencies for certain features. Again; just a design choice.
@NixBiks Maybe I could have clarified better, but I believe we're talking about the same thing here. The goal is what you are talking about, and the context is how well-suited uv's workspaces are to that end. This deploy-only-the-dependencies-each-service-needs pattern isn't well supported in the current workspace implementation, since uv only resolves a single superset lock file at the virtual root rather than individual lock files at the service level, even if the services specify only their minimum dependencies in their own pyproject.toml files.
That said, it's not difficult to work around this, as you can just exclude your services (and libraries) that you want distinct lock files from each workspace (and do a little scripting to manage locking and syncing these for many projects). But then one would wonder why use the workspace feature at all. I do think there's promise in workspaces FWIW, and looking forward to seeing them evolve to better support monorepo workflows.
@BrendanJM isn't it just uv sync --package my-service
you are looking for?
It gets you part of the way there - if I am interpreting this correctly, it seems like you could definitely use it to perform the install to just the service dependencies. But it does still look like it would still be missing a separate lock file for the service itself, which would be very helpful for e.g. cache invalidation when building a docker layer to handle this sync step.
uv lock
does not presently accept a --package
argument, which I think might be the missing piece for this.
I don't quite follow why you need a separate lockfile for just a single member? The point of using a workspace is that it ensures that you're using consistent versions across all of the members.
While the primary purpose of lock files is to resolve a consistent set of versions across the members, it's pretty standard practice to use lock/requirement files for cache management in docker.
Let's say I have a larger monorepo with 100+ packages and services. We're using uv's workspaces functionality so that we have consistent versions across all members, all resolved to a single lockfile - which is great! But not all members require all dependencies. So while uv sync --package my-service
could be useful to just populate the service dependencies, my docker
build pipeline needs to know when the dependencies have changed to build this layer and subsequent ones.
In a sufficiently large monorepo, there could be fairly constant stream of updates to the top level lockfile (unrelated to my-service
or it's dependencies), which would make it difficult to meaningfully use this in the docker file to force cache invalidation in the service's docker build, since it would basically be firing false positives with every unrelated update.
This is only mild inconvenience at smaller scales, but at a larger scale, constant false-positives to invalidation in the docker pipeline can become significant. While using a cache mount can definitely help docker benefit from a more persistent uv cache, downstream layers would still be affected and can be expensive to rebuild.
Edit to add: If having subset uv.lock
files in each workspace member feels confusing or prone to misuse (and I mean subset as in a strict subset downstream of the root lock file, not an independent lock), an alternate option could be a mechanism for generating e.g. a lock.hash
file that was simply a hash of only the dependencies for that specific member.
@BrendanJM We have a similar problem and are (for now) using something hacky like this:
# Create a stable minimal requirements.txt
uv export --frozen --directory $SUBDIR/ -o requirements.txt
# Drop references to local packages
sed -i '/-e \.\/subdir\//d' requirements.txt
# In the Dockerfile, this creates a stable cache layer
RUN uv pip sync requirements.txt --no-cache --compile-bytecode
# And then install the packages
RUN uv pip install --no-deps package_a/ package_b/
@BrendanJM This is a fantastic writeup that cuts to the heart of why using UV in monorepo CI is currently painful.
One point I would add: it's particularly painful that adding new items to the workspace changes uv.lock, even if they don't require any new dependencies. We have quite a large monorepo with most of the code is using a small number of standard 3rd party dependencies.
Adding new items to the workspace is common, but adding third party dependencies is actually fairly rare at this point. If the former didn't update uv.lock then it would make our CI much simpler and users could easily add new services and libraries to the workspace leveraging a subset of the existing workspace dependencies
I sort of think a uv export --package child --frozen --no-emit-workspace
workflow may be most appropriate for your Docker use-case. You could also pipe that into a hash utility.
Thanks for all the additional details. We'll think on this.
Hi team, the new release of
uv
looks great, really looking forward to diving into all the features and new docs.One area that I have some questions around is using
uv
for monorepos of multiple python packages, services, etc. I see that workspaces seem to offer a general purpose framework for structuring monorepos, and their package relationships.This all looks like good stuff, but there are a few open questions I have. While it seems like this provides some structure for a monorepo, there are a few features that might be desirable for monorepo tooling. One example that is top of mind is tooling to help identify what code has changed and being able to only run tests affected by this code. At scale the duration of extensive test pipelines in CI can become a very real problem for developer productivity, and unfortunately is a concern that is part and parcel to packaging solutions for large projects.
Identifying code changes and only running affected tests is a significant feature for build systems such as
bazel
andpants
, as well as tooling likepytest-testmon
. I think the way thatuv
presents structure to managing related projects is great, so my question is would some feature along these lines be something we seeuv
managing in the future? And if not, are there any early thoughts on how it could integrate with other tooling or build systems to enable this?It'd be great if we can establish some opinionated guidance here that would work in the general case.