Open infinisil opened 4 months ago
This might be a good tool for this https://github.com/getgrit/gritql well it doesn't support nix yet. I wonder how hard it is to add given it's based on tree-sitter grammars.
Adding Nix support to that tool could create a lot of value for Nix.
I tried to add Nix support to sg
and didn't get too far, likely because I didn't really understand why simple pattern matching wasn't working. Agree that a tree-sitter / codemod approach is the right way to go. tree-sitter-nix
exists and I think could be resuscitated for this usecase.
In case anyone ends up working on this, I'd be very receptive to adding Nix support in GritQL and happy to debug any patterns that don't match.
Problem: Which files are part of a package?
Was pointed out by @emilazy that we can avoid this problem by just not migrating packages that are ambiguous in such a way, which simplifies this a bunch, at the cost of not migrating everything that could be migrated
I think another simple but more general approach would be to migrate the entire directory, check that the result follows the by-name
rules, and that its derivation hash remains unchanged?
at the cost of not migrating everything that could be migrated
My unscientific survey of files in pkgs shows ~1000 packages with ../
in them
$ rg '\.\./' -c | rg '\.nix:' | wc -l
971
~100 of which are already in by-name. 900ish packages is a fair amount to migrate manually, so automation probably is needed. But since we're migrating piecemeal I don't think it's entirely out of the question to do a few smaller PRs - starting with one to catch the low-hanging fruit.
Yeah totally, I'm all in favor for dropping the more complicated automated migration to start out with. Honestly at this point it might be best to just go for an ugly solution, even if it's not perfect.
Tbh, https://github.com/nixpkgs-architecture/nix-spp worked decently well too (it's what was used to create https://github.com/NixOS/nixpkgs/pull/211832). Iirc it had some bugs for the more complicated cases, so perhaps it could be simplified to just not do those.
Most packages in Nixpkgs aren't defined by
pkgs/by-name
yet, even if they could. While one could migrate them manually, that would take a long time. Instead we should automate this. Figuring out how to do large-scale automated migrations will also be of great benefit to many other future changes in Nixpkgs.In this issue I'll try to explain the pre-existing work and my ideas on how to go about this in the current codebase, such that others could implement this.
The task
The
pkgs/by-name
directory has certain validity checks that need to be fulfilled. For the migration in particular, these are most important:pkgs/by-name/he/hello
) must not refer to files outside itself using symlinks or Nix path expressions.system
set tox86_64-linux
and check that: a. For each package directory, thepkgs.${name}
attribute must be defined ascallPackage pkgs/by-name/${shard}/${name}/package.nix args
for someargs
. b. For each package directory,pkgs.lib.isDerivation pkgs.${name}
must betrue
.A lot of packages under
pkgs.*
do fulfill these requirements, these are the ones we can migrate automatically. The ones that don't fulfil these requirements typically need some manual refactoring.After verifying these requirements, migration can happen in two separate stages:
Move all files needed for the package into the appropriate
pkgs/by-name
directory, with the entry-point atpackage.nix
.Rewrite the packages definition in
all-packages.nix
appropriately to point to../by-name/...
all-packages.nix
has{ }
as the secondcallPackage
argument, remove the definition completely.Examples
pkgs.kody
(defined here) can't be migrated automatically, because itsdefault.nix
contains../../../top-level/kodi-packages.nix
, therefore not fulfilling requirement 1.pkgs.termdown
(defined here) can't be migrated automatically, because it's not defined usingpkgs.callPackage
pkgs.qiv
(defined here) fulfills all requirements and therefore can be migrated by:pkgs/applications/graphics/qiv/default.nix
topkgs/by-name/qi/qiv/package.nix
and adjusting the definition inall-packages.nix
tocallPackage
argument is not{ }
pkgs.scli
(defined here) fulfills all requirements and therefore can be migrated by:pkgs/applications/misc/scli/default.nix
topkgs/by-name/sc/scli/package.nix
and adjusting the definition inall-packages.nix
tocallPackage
argument is{ }
we can remove the entire definition.Ratchet checks
Currently this tool implements ratchet checks, which prevent the introduction of new instances of deprecated patterns. This happens here: https://github.com/NixOS/nixpkgs-check-by-name/blob/86202962eed024d778f2bece8a68e81361898405/src/eval.rs#L515-L517
This says that the ratchet for a specific package is "loose" (aka legacy), meaning we can migrate away from it (therefore tightening the ratchet).
Currently these ratchet states are only used to compare the base branch and the head of a PR, such that we can detect invalid transitions between ratchet states, see
ratchet.rs
. This allows detect when a PR introduces a new instance of a legacy pattern.This notably also means that even if somebody merges a PR that introduces new instances of deprecated patterns (happens if a pattern is deprecated between the PRs CI running and it being merged), it won't break master, because these ratchets aren't invalid on their own: Only the transition from tight to loose is invalid.
Using ratchets for migrations
We can also use this ratchet abstraction to determine migrations, because the loose state is exactly what needs to be migrated!
The tool can have two modes:
Thus in addition to defining how ratchets can be compared, we just need a way to define how ratchets can be migrated.
I've started a very rough WIP branch of this before here. Note how:
migrate
method in addition to acompare
one.Problem: Which files are part of a package?
It's really tricky to figure out which files are part of a package and should therefore be moved. Because outside of
pkgs/by-name
, each package's entry-point file (usuallydefault.nix
) can refer to either more of its own files which should be moved (e.g../Cargo.lock
), or refer to arbitrary files from other packages in which case it can't be moved without breaking check 1 (like../../some/file.sh
).I did manage to get that to work almost perfectly in an older version of this tooling. It works by creating an index of all (static) references between Nix files to figure out whether a file was only references by a single package or many packages, and then revert that relation to figure out which packages only referred to files used by that package itself.
This tool was used to create this PR: https://github.com/NixOS/nixpkgs/pull/211832
Notably, the current
nixpkgs-check-by-name
tool doesn't have such functionality, which also means that when somebody creates a PR that introduces a new package outside ofpkgs/by-name
, it tells them to migrate it topkgs/by-name
, even if it can't be directly migrated due to it referring to shared files! This is the reason the workaround for packages with multiple versions is necessary.Problem: It's non-trivial to efficiently rewrite many sections of files
This is what I've been struggling with while experimenting. I wrote some notes on this. Basically it's really hard to use rowan (the parsing library used underneath rnix) to edit files in multiple places.
In this case I'd like to have a basic edit operation implemented that can modify or remove attributes (for
all-packages.nix
). I did get that working in the older tooling, but it's a bit too hacky (still, it shows that it can be done!).Todo
This is my current idea on how to progress with this:
nixpkgs-check-by-name
, therefore removing the need for the multi-versioned package workaround. This means that we only have a loose ratchet exactly when we can figure out how to migrate automatically.rnix
.Set up CI to run the migration for every push to Nixpkgs master, and force push the resulting changes into a separate branch, which can then be merged at any point.
Bonus: Also make it open a tracking issue for all the packages that can't be migrated manually and update it as they are fixed.
Every future addition of a new ratchet check (for any pattern we want to deprecate) will have to come with an implementation of how to migrate from it, which will then automatically run and be pushed to the branch, which can then be merged again as necessary.