NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.13k forks source link

Python: Remove test & runtime dependencies from build time closure tracking issue #272178

Open adisbladis opened 11 months ago

adisbladis commented 11 months ago

This is a meta tracking issue to aggregate & track issues & PR related to Python build time & runtime separation.

Related issues

Motivations

Stop propagation of Python dependencies from applications

When depending on applications that are written in Python in a nix-shell Python dependencies are propagated into $PYTHONPATH. This is very problematic when developing Python applications, and a number of packages (notably Poetry & PDM) have hacks that remove $out/nix-support/propagated-build-inputs. Those hacks could be entirely removed.

Reducing the number of rebuilds

By not using build time propagation we don't even need to add runtime dependencies to the build. This will drastically cut down on the amount of rebuilds we have to do.

Reducing evaluation overhead

Both through smaller build time closures, but also through possibly getting rid of the packageOverrides idiom (at least internally in nixpkgs). This idiom is problematic because it means we rebootstrap the Python package set every time it's used.

By delaying composition we can even allow for things which today are not possible because they would lead to file collisions:

python3Packages.buildPythonApplication {
  dependencies = [
    (python3Packages.requests.overridePythonAttrs(old: {
      # This can actually work when we don't enforce propagation at build time!
      # `requiredPythonModules` just needs to take the first dependency for a given name from the list.
      # By putting something earlier in the `dependencies` list they get higher precedence.
    }))
  ];
}

As per the table in my comment a nixpkgs evaluation allocates an attrset with 26755 members 2491 times. I hunted this down and this is variants of pythonPackages being instantiated. That's only the tip of the iceberg.

Circular dependencies

In Nix, build time propagation requires the build graph to be a DAG. This is not true for Python dependencies which sometimes has circular dependencies.

By only depending on the minimal build time dependencies we can reduce the risk of build-time cycles happening to almost zero. Ensuring no infinite recursion happens would probably be the job of requiredPythonModules.

Expected breakage

I've tried to keep breakage to a minimum, but something has to give for such a drastic semantic change. The good thing is that most of these issues can be fixed ahead of merging the breaking PR.

python3Packages.foo

Depending on a Python package like this will break:

stdenv.mkDerivation {
  propagatedBuildInputs = [ python3Packages.requests ];
}

You will instead need to:

stdenv.mkDerivation {
  propagatedBuildInputs = python3Packages.requiredPythonModules [ python3Packages.requests ];
}

This is because dependencies are no longer build-time propagated, so the full dependency graph calculation needs to happen in the Nix evaluator.

Note that you don't have to call requiredPythonModules manually when using buildPythonPackage/buildPythonApplication. This quirk only applies when depending on pythonPackages in non-python contexts. There are a handful of those cases in nixpkgs but I suspect the most common case is in external development shells:

pkgs.mkShell {
  packages = [ 
    python3
    python3Packages.requests
  ];
}

In those cases you'll need to either call requiredPythonModules or use withPackages/buildEnv.

python3Packages.buildPythonPackage vs python3Packages.buildPythonApplication

In #272179 the distinction between these two functions becomes much more firm:

All Python applications which has runnable binaries now needs to either be:

Pull requests

I'm trying my best to allow us to work towards this goal without breaking things unnecessarily. The first two pull requests in this list doesn't break any current behaviour. Those allow us to work towards the goal of splitting dependencies incrementally.

The third PR introduces breaking behaviour, but manages to stay mostly compatible, with the breaking changes documented above being the notable exceptions.


Add a :+1: reaction to issues you find important.

cc @FRidh @mweinelt cc @DavHau @phaer

roberth commented 11 months ago

cc @FRidh

FRidh commented 11 months ago

I've been rather inactive these last years, but I'm happy to see changes like these and I'll try to provide some feedback, though I am pretty sure these items are nothing new. While I prefer we had some Nixpkgs-wide RFC on how we handle package sets, that hasn't really happened, and so I blocking larger changes any longer does not make sense anymore either.

Separate tests

Running tests in a separate derivation we should definitely do, but we do need to ensure Hydra for Nixpkgs will pick this up. Users outside Nixpkgs can stay with separateChecks = false for the time being.

Avoid propagation

At the time I explored various ways to get rid of the current propagation. One way is as done here, using evaluation. The other, was to just write a different file to the store with the runtime dependencies, essentially implementing the propagation ourselves. At the time Eelco was of the opinion we should not do this via the evaluator but instead always use such a file during build-time for performance-reasons. I don't know have any strong preference anymore on whether we use a function or a hook, though I imagine the hook would be easier for polyglot projects.

Circular dependencies

The circular dependencies issue could be resolved by building wheels in a separate derivation. Installation then becomes a very lightweight step.

Interface change

I've been for a while of the opinion that we should aim to get rid of builders such as buildPython* and express everything with just hooks. At the same time, when we split testing out, our users would want some "support" for that, which buildPython* would provide. In my opinion the Nixpkgs Python builder is relatively "low-level" and should stay "close" to semantics used in Nixpkgs, even if that may make it more confusing for users. Convenient interfaces could be provided by third-parties, e.g. using the module system.

adisbladis commented 11 months ago

At the time I explored various ways to get rid of the current propagation. One way is as done here, using evaluation. The other, was to just write a different file to the store with the runtime dependencies, essentially implementing the propagation ourselves. At the time Eelco was of the opinion we should not do this via the evaluator but instead always use such a file during build-time for performance-reasons.

Actually performance was the main reason I got curious to look into this, and I think it's easy to have the wrong intuition about what's causing performance issues in nixpkgs.

I think writing a file to the store with runtime dependencies still wouldn't allow us to get rid of the packageOverrides idiom. rg -tnix 'python.?\\.override' | wc -l indicates that we override the Python interpreter 60 times in Nixpkgs, meaning that we're computing the Python set from scratch 60 times. That requirement is a massive performance hog.

FRidh commented 11 months ago

Where would you do the patching/wrapping of executables in this case? Still in the installed derivation minus the Python run-time dependencies?

adisbladis commented 8 months ago

Where would you do the patching/wrapping of executables in this case? Still in the installed derivation minus the Python run-time dependencies?

Exactly. $PATH would still be patched in the buildPythonPackage build, and the wrapping of the Python environment would happen in the derivation that aggregates the Python environment (buildPythonApplication/python3.withPackages/python3Packages.requiredPythonModules).

adisbladis commented 3 months ago

Open PRs have been rebased. Please take a look, especially at the one separating tests as I think we can merge that before tackling removing dependency propagation.

nixos-discourse commented 2 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/several-comments-about-priorities-and-new-policies-in-the-python-ecosystem/51790/1