Automatic `finalAttrs` pattern migration

fgaz commented 8 months ago

I recently wrote a tool that automatically performs a migration to the finalAttrs pattern (#119942), and I believe it is now robust enough to be let loose on nixpkgs. However this is the first large treewide change I'm performing, so I'm not 100% sure how to proceed. Some questions:

Is this desirable at all?
Should I perform the migration in steps or all at once?
One big commit or one commit per package?
Do we review all changes or only a sample?
@infinisil you are working on the by-name migration (#258650). Do you think this could interfere? Should I wait until your migration is complete?

cc @drupol who was interested in this

Example run of the tool on pkgs/applications and pkgs/by-name: https://github.com/fgaz/nixpkgs/commits/finalattrs-test-8/

infinisil commented 8 months ago

Discussed this in the Architecture Team meeting:

@roberth: Not sure if it's worth doing the tree-wide change, there's likely going to be a better replacement
- The finalAttrs pattern does fix problems, but not always very useful, might be better on a case-by-case
- There is some push to use the pattern throughout.
- Ideally it should also work for language-specific builders, see https://github.com/NixOS/nixpkgs/pull/234651
- Easy to make a mistake, needs good testing
@infinisil: Tending towards doing the mass replacement
@roberth: Also fine with doing it if it can be pulled off without problems
@roberth: Potential problem with rev being set to finalAttrs.version: Overriding it reuses the previous source, no error
- @fgaz: There was a discussion about this on Matrix
- @infinisil: This seems worse than the problems finalAttrs fixed
- @roberth: Original motivation was the passthru tests
- @infinisil: How about improving the docs to better mention that src needs to be overridden
- @roberth: Could have a more convenient function for overriding the source, something like
```
pkgs.hello.overrideAttrs (pkgs.changeSrc {
# ...
tag = "v1.2"; # potentially automatically strip v prefix
hash = "...";
})
```
pkgs.haskell.lib has something like that
- Or maybe the other way around: Deriving the package version from the fetcher.
Fetchers could have a standardised attribute to pass thru the rev/tag version, so the package can reuse.

Change stdenv.mkDerivation to use src.version as a default.
- @fgaz: Could look into that

Should I perform the migration in steps or all at once?

To avoid merge conflicts, doing it in batches only helps when the batches are for packages that don't have open PRs
Probably doesn't cause too many merge conflict anyways
Suggestion: Do a smaller batch first (e.g. only ones in pkgs/by-name), and then the whole tree if there's no problems

One big commit or one commit per package?

@infinisil: One big commit
Trivial to revert manually

Do we review all changes or only a sample?

@infinisil: Review the tooling

@infinisil you are working on the by-name migration (RFC 140 migration plan #258650). Do you think this could interfere? Should I wait until your migration is complete?

@infinisil: With automated tooling, it can just be re-run to "resolve" the merge conflicts, so there's no problem doing multiple tree-wide changes

eclairevoyant commented 8 months ago

@roberth: Potential problem with rev being set to finalAttrs.version: Overriding it reuses the previous source, no error

This isn't caused nor exacerbated by finalAttrs at all. If you override version and not src, even without finalAttrs, you'll also silently use the existing sources. That's just how FODs work.

The finalAttrs pattern in fact makes it easier to override src, as you may only need to set outputHash:

packageName.overrideAttrs (oldAttrs: {
  version = "0.1.1";

  src = oldAttrs.src.overrideAttrs {
    outputHash = "...";
  };
})

Without finalAttrs, you'd still need to set src, and more verbosely at that:

packageName.overrideAttrs (oldAttrs: {
  version = "0.1.1";

  src = fetchFromGitHub {
    repo = "...";
    owner = "...";
    rev = "...";
    hash = "...";
  };
})

roberth commented 8 months ago

This isn't caused nor exacerbated by finalAttrs at all.

The problem is that it looks more correct. Something like "oh, I see the author is aware of the rec problem. Great, now I can override the version easily." It's a flawed thought when you think about the hash, but if someone's in a hurry or they're distracted, and the build still works, they'll be confused: "The build worked, so it can't be the hash."

Yes, it is scope creep, but it's quite a serious issue that is exacerbated by this, even if slightly.

Deriving the package version from the fetcher.

I think this is the most promising pattern; something like:

  version = finalAttrs.src.version;
  src = fetchFromGitHub {
    rev = "v2.1";
    # ...
  };

eclairevoyant commented 8 months ago

if someone's in a hurry or they're distracted, and the build still works, they'll be confused

I doubt most users are aware of the implementation details of everything they override. If they were, surely after the first couple of "FOD moments" they'd know better?

Either way this seems more like an opportunity to document/educate. FODs are already discussed in the nix manual, so the nixpkgs manual's section on overriding might direct them there with some ancillary notes.

Deriving the package version from the fetcher.

I think this is the most promising pattern; something like:

This forces users to not use their intuition to override version; this behaviour would be extremely surprising to encounter.

Moreover, extracting version from src.rev would be more tortured atm, as you can't use string interpolation; are you suggesting to change the fetcher API as well?

roberth commented 8 months ago

Yes, the idea was that each fetcher sets version. It could be something like

A short rev if rev is a git sha
stripPrefix "v" rev if it's v[0-9].*
The rev itself if it's [0-9].*
No version attribute otherwise

I don't think fetchers currently set a version attribute, so this should be fine.

This forces users to not use their intuition to override version;

At least the package won't have any code that could make them believe that overriding just version could work. I don't think this is an intuition, but rather an idea that occurs because they read the code wrong.

Gerg-L commented 8 months ago

IMO: this should be done after https://github.com/NixOS/nixpkgs/pull/234651

lolbinarycat commented 8 months ago

seems like it has potential to silently break packages that don't have enough unit tests due to subtle differences in semantics.

SuperSandro2000 commented 8 months ago

How would he ensured that there won't be an unmanageable amount of silent breakages especially in people's overlays?

Also how would packages be filtered which would be broken by this change because they previously relied on the rec behavior?

Also it would probably be a good idea to first write a tool to verify the correctness of FOD hashes to easily spot breakages in any urls which might change because of wrong usages of pname as part of URLs/paths/binary names in combination with overlays which append suffixes like -headless or -minimal.

A short rev if rev is a git sha

This would conflict with our current contributing guide and the -unstable-XXXX-XX-XX suffix. Also short refs are not really recommended for long term use.

stripPrefix "v" rev if it's v[0-9].*

Which would probably also need to consider refs/tags prefix and we probably collect more of those over time and could easily make it complicated.

lolbinarycat commented 8 months ago

i think packages should be updated to use finalAttrs as they're touched honestly, even if you made the commits automatically, you would still need to review them manually, and that would probably be about the same amount of work, but because github only allows you to approve a PR and not individual files of a PR, all that review work would probably have to be done by one person (actually several, a change this big should require multiple approvals)

there's lots of patterns that are inoccuous with rec that are a disaster waiting to happen with finalAttrs

eclairevoyant commented 8 months ago

there's lots of patterns that are inoccuous with rec that are a disaster waiting to happen with finalAttrs

I'd agree, if this tool didn't account for the main point of rec misuse: name/pname references. I was wary myself before testing the tool - and I think fgaz undersold the considerations put in to design the tool - but I currently think it is robust enough to at least initiate a PR.

Give it a try on some individual packages :)

SuperSandro2000 commented 8 months ago

I'd agree, if this tool didn't account for the main point of rec misuse

but we also need to look at the less often used misuses, otherwise we could run into a situation where a tiny fraction of the changes, still takes a significant amount of the time to fix things afterwards. I don't have good idea, how we could tell apart packages which are very safe to change and which will likely cause issues. Maybe we would need to analyze the AST and can then categorize the packages based on that and start with the very safe packages and do the complicated ones by hand.

I skimmed through the source code and I think it currently only checks pname which is in my opinion not enough and the to do list should get an entry to develop the tool and write good tests for it.

Also the following comment from the readme makes me think that this is not rock solid and deeply tested:

Warning: Even though the library checks that derivations did not change, skimming the output for errors is a good idea. The program could have incorrectly changed something in meta, passthru, or an expression that is only evaulated under certain conditions.

Without finalAttrs, you'd still need to set src, and more verbosely at that:
packageName.overrideAttrs (oldAttrs: {
  version = "0.1.1";

  src = fetchFromGitHub {
    repo = "...";
    owner = "...";
    rev = "...";
    hash = "...";
  };
})

This can already be shortened with override https://github.com/NixOS/nixpkgs/blob/01048ff3150ac438b9ffac42fb1ab8054fcae9d8/pkgs/applications/finance/odoo/default.nix#L14-L20

fgaz commented 8 months ago

Indeed I forgot to mention the measures I took to ensure a robust migration:

Only actual undefined references are substituted. The tool gets the exact reference position from eval errors.
The tool checks that packages evaluate after the change (including meta and passthru).
Derivation hashes are compared to ensure they do not change.
If pname misuse is detected, the package is skipped.
- If a blacklist is deemed too risky we can have a whitelist (version, buildInputs, ...) instead

Also the following comment from the readme makes me think that this is not rock solid and deeply tested:

Warning: Even though the library checks that derivations did not change, skimming the output for errors is a good idea. The program could have incorrectly changed something in meta, passthru, or an expression that is only evaulated under certain conditions.

That text is outdated, I'll remove it. meta and passthru are now checked. Regarding conditionals, as far as I can see they don't really hide undefined references from the same file (I will check).

write good tests for it

I should definitely add some. I've been using nixpkgs itself to test the tool, and I can extract some golden tests from it.

fgaz commented 8 months ago

but we also need to look at the less often used misuses

Maybe I can have the tool output a list of all the used recursive variable names?

fgaz commented 8 months ago

I think this is the most promising pattern; something like:
  version = finalAttrs.src.version;
  src = fetchFromGitHub {
    rev = "v2.1";
    # ...
  };

I thought about this, and I agree with @SuperSandro2000 that it isn't the right solution.

Ontologically, src depends on version, not the opposite
In the real world, src very often isn't as simple as fetchgit { rev = "v${version}", ... }, and extracting version from src can be a lot more complex than the opposite.
- Even a simple fetchurl { url = "https://example.org/foo-1.2.3-1.tar.gz", ... } requires a regex
- What if there are multiple srcs?
- What if the version depends on something else?
- It might not even be possible: try running git grep 'replaceStrings \[ *"." *\] \[ *"" *\]' in nixpkgs
We can do this only on packages where rev is "v${version}", but I think this is too much of a POLA violation

ShamrockLee commented 8 months ago

The problem of the irrelevant FOD hashes can be solved by embedding version or rev into the name of the FOD. The store path will change every time the version changes, forcing a re-fetch.

We could make fetchers support pname and version, and provide something like tagPrefix to enable auto-construction of rev if not otherwise specified. The down side is that the file extension would be lost, which might break some tryUnpack hooks. To fix this, nameSuffix could optionally be specified.

ShamrockLee commented 8 months ago

The problem of the irrelevant FOD hashes can be solved by embedding version or rev into the name of the FOD. The store path will change every time the version changes, forcing a re-fetch.

We could make fetchers support pname and version, and provide something like tagPrefix to enable auto-construction of rev if not otherwise specified. The down side is that the file extension would be lost, which might break some tryUnpack hooks. To fix this, nameSuffix could optionally be specified.

I made a PR #294068 to demonstrate this solution.

lolbinarycat commented 8 months ago

The problem of the irrelevant FOD hashes can be solved by embedding version or rev into the name of the FOD. The store path will change every time the version changes, forcing a re-fetch.

just want to let you know this has been tried several times before.

ShamrockLee commented 8 months ago

The problem of the irrelevant FOD hashes can be solved by embedding version or rev into the name of the FOD. The store path will change every time the version changes, forcing a re-fetch.

just want to let you know this has been tried several times before.

Are there some PR about these attempts? What obstacles did they encounter?

lolbinarycat commented 8 months ago

here's a few attempts at solving the fixed-output derivation problem

NixOS/nix#7999 NixOS/nix#969

ShamrockLee commented 8 months ago

Thank you!

here's a few attempts at solving the fixed-output derivation problem

NixOS/nix#7999 NixOS/nix#969

Those proposal targets Nix itself, while mine works on the fetchers inside Nixpkgs and doesn't affect the behavior of Nix.

fgaz commented 7 months ago

see also NixOS/rfcs#171: Default name of fetchFromGithub FOD to include revision

lolbinarycat commented 7 months ago

@ShamrockLee also note that nix flakes seem to have implemented some kind of workaround for this, as i haven't seen this problem occur there.

ShamrockLee commented 7 months ago

@ShamrockLee also note that nix flakes seem to have implemented some kind of workaround for this, as i haven't seen this problem occur there.

@lolbinarycat I haven't dug deep into the C++ implementation of flake inside Nix, but since flake.lock includes the input specifications (including revision specification) as nodes.<input>.original, it's quite easy for compare such specifications between flake.nix and flake.lock, and re-fetch upon mismatch.

At Nix level, they can see the original store derivation (/nix/store/*.drv). and have space to decide if they would like to re-realize the outPaths or just reuse them, after the ovewriting the store derivation. The latter is the current implementation. The former serves as a universal solution to the FOD re-fetching issues, but making the "original store path" a factor to determine the realization approach inevitably complicates the realization behavior and make Nix realization more "stateful".

At Nixpkgs level, all we have is the current specification, nothing else. All we could use to invalidate previous store paths are new store paths. As the drvPath of a FOD is determined by both name and outputHash and that the outputHash is specified by the user, our best bet is to code the revision into the name attribute. That's what #294068 and NixOS/rfcs#171 is about. (The review process of #294068 is paused in favour of NixOS/rfcs#171.)

oxalica commented 4 months ago

The finalAttrs pattern in fact makes it easier to override src, as you may only need to set outputHash:
packageName.overrideAttrs (oldAttrs: {
  version = "0.1.1";

  src = oldAttrs.src.overrideAttrs {
    outputHash = "...";
  };
})

:thinking: So here oldAttrs is actually finalAttrs which picks the new overridden version? It seems weird to me, if it's not oldAttrs, why does src = finalAttrs.src.overrideAttrs ... not causing an immediate infinite recursion? If it's actually not oldAttrs, we also need to update all relevant documentations.

Gerg-L commented 4 months ago

@oxalica oldAttrs here is prevAttrs not finalAttrs https://nixos.org/manual/nixpkgs/stable/#sec-pkg-overrideAttrs

RuRo commented 4 months ago

@oxalica no, oldAttrs in that example is prevAttrs, not finalAttrs. Here's a full example that should hopefully demonstrate, why this works:

# hello_2_10/package.nix
{ stdenv, fetchurl }:

stdenv.mkDerivation (finalAttrs: {
  pname = "hello";
  version = "2.10";

  src = fetchurl {
    url = "mirror://gnu/hello/hello-${finalAttrs.version}.tar.gz";
    hash = "sha256-MeBmE3qWJnbon2nRtlOC3pWn732RS4y5VvQepy4PUWs=";
  };
})

# some_overlay.nix
hello_2_12 = hello_2_10.overrideAttrs (
  finalAttrs: prevAttrs: {
    version = "2.12";
    src = prevAttrs.src.overrideAttrs {
      outputHash = "sha256-zwSvhtwIUmjF9EcPuuSbGK+8Iht4CWqrhC2TSna60Ks=";
    };
  }
);

The reason, why you don't have to manually propagate version to src is because the original derivation used finalAttrs.version when specifying the url in fetchurl (or something like rev = "v${finalAttrs.version}" in the case of fetchFromGithub).

Edit: sniped, lol

Gerg-L commented 4 months ago

yeah the arguments are opposite in overrideAttrs and mkDerivation mkDerivation final is first overrideAttrs prev is first

RuRo commented 4 months ago

yeah the arguments are opposite in overrideAttrs and mkDerivation mkDerivation final is first overrideAttrs prev is first

I wouldn't say "opposite". Here are the valid "signatures":

{
  # Old-style signatures
  # You can still use them, but they don't cooperate well with multiple overrideAttrs
  # rec { } was commonly used with these signatures, but now it's considered an antipattern
  _ = mkDerivation { };
  _ = overrideAttrs { };
  _ = overrideAttrs (prevAttrs: { });

  # New-style signatures
  # (added in PR 119942)
  _ = mkDerivation (finalAttrs: { });
  _ = overrideAttrs (finatlAttrs: prevAttrs: { });
}

mkDerivation takes zero or one argument (it never takes prevAttrs, because there is no such thing as "previous" attributes, when creating the derivation from scratch) and overrideAttrs takes zero, one or two arguments.

NixOS / nixpkgs

Automatic `finalAttrs` pattern migration #293452