Decide on semantics for workspace environment syncing (with `--package`, `--extra`, and `--dev`)

astral-sh / uv

An extremely fast Python package and project manager, written in Rust.

https://docs.astral.sh/uv

Apache License 2.0

22.55k stars 654 forks source link

Decide on semantics for workspace environment syncing (with `--package`, `--extra`, and `--dev`) #4730

Open charliermarsh opened 3 months ago

charliermarsh commented 3 months ago

We had an extensive discussion about this in Discord. I'll try to summarize some of it here, but we still need to make some important decisions about how virtualenv syncing should behave for workspaces. Most of these changes aren't hard to implement, but they do have a lot of follow-on impact.

Today, we use a single virtualenv at the workspace root. When you do uv run -p package, we bring the environment in-sync with the necessary dependencies for package. That is, we don't install the entire workspace -- only the dependencies that are required for package. This has the benefit that you can't accidentally import some other dependency from the workspace within the package code (it enforces some amount of isolation).

However, this syncing is purely additive. So if you run uv run -p foo then uv run -p bar, we'll install any packages necessary for bar, but we won't uninstall the packages that we needed for foo (but not bar).

Extras behave similarly. So if you run uv run -p bar --extra foo, we'll install the extras activated by foo, and then leave them activated if you run uv run -p bar.

This strategy has some desirable properties (we enforce isolation on initial uv run -p, we only sync the necessary packages rather than building the entire workspace) but it's sort of a compromise (we don't enforce strict isolation on subsequent uv run -p).

One alternative we discussed was: creating a virtualenv per package in the workspace, on-demand. So if you did uv run -p package, we'd create a virtualenv in package with that package's dependencies. We'd also have a virtualenv in the workspace root.

This gives us the same properties that we have today, with the added benefit that it does enforce strict isolation. But there are two downsides: (1) you now have multiple virtualenvs in your project, which could confuse editors (I think this is totally fine); (2) you don't have a single virtualenv with the entire workspace installed, which means your editor could lack completions and Go-To definition for packages in the workspace (this problem also exists today); and (3) it doesn't solve the extras problem.

(3) seems to be the critical one. Extras are kind of the same problem (stateful environments) as -p, but don't have a natural analog to "store the virtualenv in the package". In short, we leave extras enabled after a uv run, and this could be problematic for some packages.

One solution to the "extras" problem would be to do a strict sync on every uv run. So if you did uv run -p foo --extra bar and then uv run -p foo, we'd remove the bar extra on that second invocation. The main downside here seems to be that you uv sync -p foo --extra bar and then uv run -p foo would remove the bar extra, which some have argued is unintuitive. (N.B. uv sync -p foo does not exist today.)

charliermarsh commented 3 months ago

One proposal:

uv run --package creates a venv in --package
uv run operates with pip install semantics (doesn't remove extraneous packages)
uv sync operates with pip install semantics
uv sync --locked operates with pip sync semantics
uv run --locked operates with pip sync semantics

Unclear if uv run --package --extra would modify the base environment or ephemerally layer on the extra.

I think this proposal is reasonable as "I need extras removed when they're not activated" is sort of a niche requirement, and so making it a non-default isn't... that bad?

charliermarsh commented 3 months ago

Another proposal from @konstin:

Alright here's the pitch:

uv sync installs everything in .venv. That gives your IDE (and type checkers) all packages for analysis. uv sync retains --package and --extra, but they are when you need that level of control, most users should use uv run. If you use uv sync with parameters, you effectively opt in to managing venv state yourself.
uv run runs in a separate environment by default. We place that venv under .uv/venvs, so you can clean it up with rm -rf .uv and inspect it if needs be. This means there are only two folder in your project with uv state (.venv and .uv; that's one too many for me already, but we don't get out of .venv)
If you need a per-project env, you can cd packages/bert && uv venv && uv sync -p bert. It's a regular venv still so IDEs will run regularly just on this package.
When you do uv run -p bert --extra torch, we create something like .uv/venvs/bert-torch with exactly that set of dependencies. This means that you get both a strict invocation and a cached environment on the next run. (ofc if your lockfile changed, we update the env)

In a way this gives us three levels in the uv CLI:

The pip mode, everything is manual, there are the known quirks. Interesting for users coming from pip and pip-tools
Managing your venv manually with uv sync --package ... --extra .... Similar to poetry or pdm, that's probably what i'd use a lot cause that's where my level of expertise is
The managed uv run mode. You shouldn't know about venvs at all beyond "my IDE see .venv" to use this. This is the easiest, most user friendly way, but also the one with the least control when you're used to pip installing to your hand-curated venvs.

The huge drawback here: Modulo default-{packages,extras,command}, you have to specify all arguments in every uv run every time.

konstin commented 3 months ago

I want to add another perspective to this: It says the problem we're looking at is venv state. There are several things that influence the state of the venv:

Additional packages, both workspace packages (-p) and extras (--extra). To me, a workspace is one big virtual package where the members are optional dependencies like extras, so i've put them here with the extras.
Removing packages, such as --no-dev
Changing the python version. Changing the python version has the effect that we have to change many packages and that the meaning of extras can change (through markers with both python version and extra condition).

Different use cases want different venvs:

IDEs, type checker and other linters want to have a .venv with all dependencies installed (for slightly different definitions of all) so they can inspect your dependencies.
When running pytest on a specific of package on a specific python version with specific extras i'd like to test only that configuration. In this case, i'm ready to specify what exactly i need and generally accept that the venv is temporary,
When i run e.g. the backend entrypoint, it would be nice if only the backend dependencies were available to check they are correct (nice to have).
When deploying (mainly docker containers), i want to sync only a specific set of packages, extras and no dev deps.

Contrary to the pitch in the comment above, i've realized that for my use cases, isolation generally isn't that important, i'm fine with a single "everything venv", as long as i can run with isolation when i need it. Caching isolated temporary venvs created e.g. running pytest with isolation (ideally in a single gitignored folder in the project root where i can clear them) would be nice.

potiuk commented 3 months ago

I'd agree with @konstin - and that's something that might be useful for Airflow.

For development, even for regular running of the tests, having single .venv where you can install all packages is ideal, while you have a way to run tests or scripts in separate, isolated venvs. You should be able to choose whether you want to install all packages or just one or few packages there.

There are cases where you have "main" package and "extra" packages and some of the extra packages have (sometimes optional) dependencies on each other - and in this case isolated environments will not work - because even for tests you need the "other" packages as well - at least for some of the tests.

So natural behaviour would be:

a) run your development, autocomplete, tests in your "potentially-all-combined" environment. b) cd to your package and run your tests (or subset of those that do not require other packages) there in an "isolated environment). c) (optional) allow those "isolated" environments to install other packages if you really want (to be able to test those cross-package tests)

I think choosing which packages to install and syncing the "main" or "isolated" (if c) is supported) venvs should be purely additive - UNLESS you deliberately choose to cleanup the venv before. That should simplify all the "remove package from spec" cases - you allow the venv to have more things than needed, and if. you really want a clean version, run the installation with --clean switch.

charliermarsh commented 2 months ago

Relatedly: we need to consider how this will interact with the red-knot work...

charliermarsh commented 2 months ago

A few competing concerns:

We want the user to have a great editor experience -- one that "just works". This likely requires (1) a fully-populated virtualenv, with all packages and extras that (2) exists at the repo root, rather than being hidden away in some cache directory.
We'd like to avoid syncing any more packages than are necessary when the user runs a command. For example, uv run -p foo shouldn't need to install and build bar in the workspace, if it's not in the foo dependency tree.
We'd like to avoid allowing users to import packages that aren't declared as dependencies for a given workspace member (i.e., enforce some isolation). The same applies to extras: we'd like to avoid allowing users to import an extra dependency that hasn't been requested.

charliermarsh commented 2 months ago

Here's my latest proposal:

We keep our current behavior for now. This includes: a single virtualenv at the root that's updated additively (i.e., with pip install semantics). So running uv run -p foo from an empty state would add foo and its dependencies. Running uv run would then add any missing dependencies for the root, and so on.
We repurpose --isolated (or introduce a new flag) to mean "don't use the project .venv, run the command in an isolated environment". So you can run commands in ephemeral environments per above.

Given the criteria above, this proposal is weak on the following points:

We don't enforce import or extra boundaries for packages, unless the user runs with --isolated.
There isn't really a way to sync the extra workspace to the root .venv. uv sync --all-extras is close, but if there are packages in the workspace that aren't depended on by the root package, those won't be included.

Other than that, though, I think it does a good job on the three points above (we have a single root virtualenv that is intended to cover the package, but we build it up incrementally on-demand). It's also easy to implement (it's close to what we do today) and IMO easy to change and extend in the future.

One additional behavior that I'd be open to:

uv run -p foo creates a new virtualenv in .venv, and we run in there.

This would at least enforce import boundaries for packages (but not extras). I don't see any downsides to this, other than that any manual modifications to the root .venv wouldn't be applied to those inner virtualenvs.

zanieb commented 2 months ago

uv run -p foo creates a new virtualenv in .venv, and we run in there.

I would avoid specifying the location, i.e., I think we'd just cache it as normal (like we do for tools) — we can determine if it should be user-facing separately. The important bit here is whether or not uv run -p foo is isolated by default or with opt in, right?

We probably need a separate flag from --isolated. Frankly I think --isolated makes the most sense when applied to environments, not settings but this is a tough spot since Ruff uses --isolated for config discovery. Riffing on some other options: --no-workspace, --no-project, --strict. I find it nice to focus on the idea that we're excluding various other dependencies rather than something like --fresh-venv which emphasizes how we go about excluding the other dependencies.

I'm feeling something like:

uv sync is required to create the user-facing persistent environment (.venv) for an editor experience or manual modifications of the environment. By default, this environment should contain most of the workspace (and their extras?).
uv run always uses a layered non-persistent environment (it can be cached). It could be empty if the persistent environment meets the requirements but prevents something like uv run -- pip install anyio having inconsistent behavior i.e. sometimes mutating the persistent environment.
An --isolated (or similar) flag is available to avoid layering on top of the persistent environment, enforce that manually managed packages aren't used, and that a package's dependencies are correctly declared.

Regarding:

If there are packages in the workspace that aren't depended on by the root package, those won't be included.

I think this is reasonable default behavior, though I'm not sure how we solve for it in the editor. We should have a flag, e.g., --all-workspace-members, to install everything regardless of the dependencies of the root package? but I imagine if a workspace member is intentionally excluded from the tree then it is intended to be incompatible or used separately? Do we allow package version incompatibilities in this case?

charliermarsh commented 2 months ago

By default, this environment should contain most of the workspace (and their extras?).

I think we should leave this as-is today, honestly. The rest seems fine to me for now.

charliermarsh commented 2 months ago

As of this proposal, we'd be using --isolated for ~three things:

Ignore any discovered settings.
Avoid operating in the context of a project for uv run.
Avoid using the persistent environment for uv run (this is in conflict with (2)).

For (2), we also have the no-managed = true flag in the TOML which achieves the same effect, I think.

I think it's probably right for --isolated to do the first two things, because it's meant to mean "ignore any files on-disk". It would be strange for it to do just the first but not the second, because then we'd use the dependencies for the discovered project but not the settings, that live in the same file? Though we could say that's fine and make them separate...

What are the ideal names here? Something like...

--no-config?
--no-project?
--isolated?

charliermarsh commented 2 months ago

Candidly, I think the easiest thing for now (which also leaves the door open for future improvements) is just to keep what we have now, but add an --isolated or --standalone flag to run in an ephemeral environment, without the base environment.

Everything else can kind of be layered on later if we want (e.g., always running in a (maybe empty) ephemeral environment --that seems purely additive).

I personally don't want uv sync to be required, because it makes the environments feel too "hidden" for my liking.

zanieb commented 2 months ago

Arguably we have further things to discuss here, maybe we should just drop it from the project? Or just remember to revisit later?

charliermarsh commented 2 months ago

Yeah that's reasonable.

charliermarsh commented 2 months ago

(Removed from the project.)