Open adisbladis opened 11 months ago
cc @FRidh
I've been rather inactive these last years, but I'm happy to see changes like these and I'll try to provide some feedback, though I am pretty sure these items are nothing new. While I prefer we had some Nixpkgs-wide RFC on how we handle package sets, that hasn't really happened, and so I blocking larger changes any longer does not make sense anymore either.
Running tests in a separate derivation we should definitely do, but we do need to ensure Hydra for Nixpkgs will pick this up. Users outside Nixpkgs can stay with separateChecks = false
for the time being.
At the time I explored various ways to get rid of the current propagation. One way is as done here, using evaluation. The other, was to just write a different file to the store with the runtime dependencies, essentially implementing the propagation ourselves. At the time Eelco was of the opinion we should not do this via the evaluator but instead always use such a file during build-time for performance-reasons. I don't know have any strong preference anymore on whether we use a function or a hook, though I imagine the hook would be easier for polyglot projects.
The circular dependencies issue could be resolved by building wheels in a separate derivation. Installation then becomes a very lightweight step.
I've been for a while of the opinion that we should aim to get rid of builders such as buildPython*
and express everything with just hooks. At the same time, when we split testing out, our users would want some "support" for that, which buildPython*
would provide. In my opinion the Nixpkgs Python builder is relatively "low-level" and should stay "close" to semantics used in Nixpkgs, even if that may make it more confusing for users. Convenient interfaces could be provided by third-parties, e.g. using the module system.
At the time I explored various ways to get rid of the current propagation. One way is as done here, using evaluation. The other, was to just write a different file to the store with the runtime dependencies, essentially implementing the propagation ourselves. At the time Eelco was of the opinion we should not do this via the evaluator but instead always use such a file during build-time for performance-reasons.
Actually performance was the main reason I got curious to look into this, and I think it's easy to have the wrong intuition about what's causing performance issues in nixpkgs.
I think writing a file to the store with runtime dependencies still wouldn't allow us to get rid of the packageOverrides
idiom.
rg -tnix 'python.?\\.override' | wc -l
indicates that we override the Python interpreter 60 times in Nixpkgs, meaning that we're computing the Python set from scratch 60 times.
That requirement is a massive performance hog.
Where would you do the patching/wrapping of executables in this case? Still in the installed derivation minus the Python run-time dependencies?
Where would you do the patching/wrapping of executables in this case? Still in the installed derivation minus the Python run-time dependencies?
Exactly. $PATH
would still be patched in the buildPythonPackage
build, and the wrapping of the Python environment would happen in the derivation that aggregates the Python environment (buildPythonApplication
/python3.withPackages
/python3Packages.requiredPythonModules
).
Open PRs have been rebased. Please take a look, especially at the one separating tests as I think we can merge that before tackling removing dependency propagation.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
This is a meta tracking issue to aggregate & track issues & PR related to Python build time & runtime separation.
Related issues
Motivations
Stop propagation of Python dependencies from applications
When depending on applications that are written in Python in a nix-shell Python dependencies are propagated into
$PYTHONPATH
. This is very problematic when developing Python applications, and a number of packages (notably Poetry & PDM) have hacks that remove$out/nix-support/propagated-build-inputs
. Those hacks could be entirely removed.Reducing the number of rebuilds
By not using build time propagation we don't even need to add runtime dependencies to the build. This will drastically cut down on the amount of rebuilds we have to do.
Reducing evaluation overhead
Both through smaller build time closures, but also through possibly getting rid of the
packageOverrides
idiom (at least internally in nixpkgs). This idiom is problematic because it means we rebootstrap the Python package set every time it's used.By delaying composition we can even allow for things which today are not possible because they would lead to file collisions:
As per the table in my comment a nixpkgs evaluation allocates an attrset with 26755 members 2491 times. I hunted this down and this is variants of pythonPackages being instantiated. That's only the tip of the iceberg.
Circular dependencies
In Nix, build time propagation requires the build graph to be a DAG. This is not true for Python dependencies which sometimes has circular dependencies.
By only depending on the minimal build time dependencies we can reduce the risk of build-time cycles happening to almost zero. Ensuring no infinite recursion happens would probably be the job of
requiredPythonModules
.Expected breakage
I've tried to keep breakage to a minimum, but something has to give for such a drastic semantic change. The good thing is that most of these issues can be fixed ahead of merging the breaking PR.
python3Packages.foo
Depending on a Python package like this will break:
You will instead need to:
This is because dependencies are no longer build-time propagated, so the full dependency graph calculation needs to happen in the Nix evaluator.
Note that you don't have to call
requiredPythonModules
manually when usingbuildPythonPackage
/buildPythonApplication
. This quirk only applies when depending onpythonPackages
in non-python contexts. There are a handful of those cases in nixpkgs but I suspect the most common case is in external development shells:In those cases you'll need to either call
requiredPythonModules
or usewithPackages
/buildEnv
.python3Packages.buildPythonPackage
vspython3Packages.buildPythonApplication
In #272179 the distinction between these two functions becomes much more firm:
buildPythonPackage
is for building a Python package without dependency propagationbuildPythonApplication
is for building application bundles. This internally callsbuildPythonPackage
, and also creates a wrapper derivation that takes care of callingrequiredPythonModules
.All Python applications which has runnable binaries now needs to either be:
buildPythonApplication
toPythonApplication
Pull requests
I'm trying my best to allow us to work towards this goal without breaking things unnecessarily. The first two pull requests in this list doesn't break any current behaviour. Those allow us to work towards the goal of splitting dependencies incrementally.
The third PR introduces breaking behaviour, but manages to stay mostly compatible, with the breaking changes documented above being the notable exceptions.
[x] buildPythonPackage: Separate runtime & build time dependencies Introduces argument separation between build, runtime & optional dependencies to
buildPythonPackage
. This is a requisite change to start[ ] buildPythonPackage: Add support for running tests in a separate derivation Introduces a new argument to
buildPythonPackage
calledseparateChecks
. Apassthru.tests.python
attribute is added where the tests are run separately from building the package.[ ] WIP: Python no runtime deps at build time Removes propagation and introduces the previously mentioned distinction between
buildPythonPackage
&buildPythonApplication
.Add a :+1: reaction to issues you find important.
cc @FRidh @mweinelt cc @DavHau @phaer