astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
22.55k stars 654 forks source link

Decide on semantics for workspace environment syncing (with `--package`, `--extra`, and `--dev`) #4730

Open charliermarsh opened 3 months ago

charliermarsh commented 3 months ago

We had an extensive discussion about this in Discord. I'll try to summarize some of it here, but we still need to make some important decisions about how virtualenv syncing should behave for workspaces. Most of these changes aren't hard to implement, but they do have a lot of follow-on impact.

Today, we use a single virtualenv at the workspace root. When you do uv run -p package, we bring the environment in-sync with the necessary dependencies for package. That is, we don't install the entire workspace -- only the dependencies that are required for package. This has the benefit that you can't accidentally import some other dependency from the workspace within the package code (it enforces some amount of isolation).

However, this syncing is purely additive. So if you run uv run -p foo then uv run -p bar, we'll install any packages necessary for bar, but we won't uninstall the packages that we needed for foo (but not bar).

Extras behave similarly. So if you run uv run -p bar --extra foo, we'll install the extras activated by foo, and then leave them activated if you run uv run -p bar.

This strategy has some desirable properties (we enforce isolation on initial uv run -p, we only sync the necessary packages rather than building the entire workspace) but it's sort of a compromise (we don't enforce strict isolation on subsequent uv run -p).

One alternative we discussed was: creating a virtualenv per package in the workspace, on-demand. So if you did uv run -p package, we'd create a virtualenv in package with that package's dependencies. We'd also have a virtualenv in the workspace root.

This gives us the same properties that we have today, with the added benefit that it does enforce strict isolation. But there are two downsides: (1) you now have multiple virtualenvs in your project, which could confuse editors (I think this is totally fine); (2) you don't have a single virtualenv with the entire workspace installed, which means your editor could lack completions and Go-To definition for packages in the workspace (this problem also exists today); and (3) it doesn't solve the extras problem.

(3) seems to be the critical one. Extras are kind of the same problem (stateful environments) as -p, but don't have a natural analog to "store the virtualenv in the package". In short, we leave extras enabled after a uv run, and this could be problematic for some packages.

One solution to the "extras" problem would be to do a strict sync on every uv run. So if you did uv run -p foo --extra bar and then uv run -p foo, we'd remove the bar extra on that second invocation. The main downside here seems to be that you uv sync -p foo --extra bar and then uv run -p foo would remove the bar extra, which some have argued is unintuitive. (N.B. uv sync -p foo does not exist today.)

charliermarsh commented 3 months ago

One proposal:

Unclear if uv run --package --extra would modify the base environment or ephemerally layer on the extra.

I think this proposal is reasonable as "I need extras removed when they're not activated" is sort of a niche requirement, and so making it a non-default isn't... that bad?

charliermarsh commented 3 months ago

Another proposal from @konstin:


Alright here's the pitch:

In a way this gives us three levels in the uv CLI:

The huge drawback here: Modulo default-{packages,extras,command}, you have to specify all arguments in every uv run every time.

konstin commented 3 months ago

I want to add another perspective to this: It says the problem we're looking at is venv state. There are several things that influence the state of the venv:

Different use cases want different venvs:


Contrary to the pitch in the comment above, i've realized that for my use cases, isolation generally isn't that important, i'm fine with a single "everything venv", as long as i can run with isolation when i need it. Caching isolated temporary venvs created e.g. running pytest with isolation (ideally in a single gitignored folder in the project root where i can clear them) would be nice.

potiuk commented 3 months ago

I'd agree with @konstin - and that's something that might be useful for Airflow.

For development, even for regular running of the tests, having single .venv where you can install all packages is ideal, while you have a way to run tests or scripts in separate, isolated venvs. You should be able to choose whether you want to install all packages or just one or few packages there.

There are cases where you have "main" package and "extra" packages and some of the extra packages have (sometimes optional) dependencies on each other - and in this case isolated environments will not work - because even for tests you need the "other" packages as well - at least for some of the tests.

So natural behaviour would be:

a) run your development, autocomplete, tests in your "potentially-all-combined" environment. b) cd to your package and run your tests (or subset of those that do not require other packages) there in an "isolated environment). c) (optional) allow those "isolated" environments to install other packages if you really want (to be able to test those cross-package tests)

I think choosing which packages to install and syncing the "main" or "isolated" (if c) is supported) venvs should be purely additive - UNLESS you deliberately choose to cleanup the venv before. That should simplify all the "remove package from spec" cases - you allow the venv to have more things than needed, and if. you really want a clean version, run the installation with --clean switch.

charliermarsh commented 2 months ago

Relatedly: we need to consider how this will interact with the red-knot work...

charliermarsh commented 2 months ago

A few competing concerns:

charliermarsh commented 2 months ago

Here's my latest proposal:

Given the criteria above, this proposal is weak on the following points:

Other than that, though, I think it does a good job on the three points above (we have a single root virtualenv that is intended to cover the package, but we build it up incrementally on-demand). It's also easy to implement (it's close to what we do today) and IMO easy to change and extend in the future.

One additional behavior that I'd be open to:

This would at least enforce import boundaries for packages (but not extras). I don't see any downsides to this, other than that any manual modifications to the root .venv wouldn't be applied to those inner virtualenvs.

zanieb commented 2 months ago

uv run -p foo creates a new virtualenv in .venv, and we run in there.

I would avoid specifying the location, i.e., I think we'd just cache it as normal (like we do for tools) — we can determine if it should be user-facing separately. The important bit here is whether or not uv run -p foo is isolated by default or with opt in, right?

We probably need a separate flag from --isolated. Frankly I think --isolated makes the most sense when applied to environments, not settings but this is a tough spot since Ruff uses --isolated for config discovery. Riffing on some other options: --no-workspace, --no-project, --strict. I find it nice to focus on the idea that we're excluding various other dependencies rather than something like --fresh-venv which emphasizes how we go about excluding the other dependencies.

I'm feeling something like:

Regarding:

If there are packages in the workspace that aren't depended on by the root package, those won't be included.

I think this is reasonable default behavior, though I'm not sure how we solve for it in the editor. We should have a flag, e.g., --all-workspace-members, to install everything regardless of the dependencies of the root package? but I imagine if a workspace member is intentionally excluded from the tree then it is intended to be incompatible or used separately? Do we allow package version incompatibilities in this case?

charliermarsh commented 2 months ago

By default, this environment should contain most of the workspace (and their extras?).

I think we should leave this as-is today, honestly. The rest seems fine to me for now.

charliermarsh commented 2 months ago

As of this proposal, we'd be using --isolated for ~three things:

  1. Ignore any discovered settings.
  2. Avoid operating in the context of a project for uv run.
  3. Avoid using the persistent environment for uv run (this is in conflict with (2)).

For (2), we also have the no-managed = true flag in the TOML which achieves the same effect, I think.

I think it's probably right for --isolated to do the first two things, because it's meant to mean "ignore any files on-disk". It would be strange for it to do just the first but not the second, because then we'd use the dependencies for the discovered project but not the settings, that live in the same file? Though we could say that's fine and make them separate...

What are the ideal names here? Something like...

  1. --no-config?
  2. --no-project?
  3. --isolated?
charliermarsh commented 2 months ago

Candidly, I think the easiest thing for now (which also leaves the door open for future improvements) is just to keep what we have now, but add an --isolated or --standalone flag to run in an ephemeral environment, without the base environment.

Everything else can kind of be layered on later if we want (e.g., always running in a (maybe empty) ephemeral environment --that seems purely additive).

I personally don't want uv sync to be required, because it makes the environments feel too "hidden" for my liking.

zanieb commented 2 months ago

Arguably we have further things to discuss here, maybe we should just drop it from the project? Or just remember to revisit later?

charliermarsh commented 2 months ago

Yeah that's reasonable.

charliermarsh commented 2 months ago

(Removed from the project.)