NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.64k stars 1.51k forks source link

Switch store derivations from using ATerm to using JSON or some other mainstream format #5481

Open catern opened 2 years ago

catern commented 2 years ago

Is your feature request related to a problem? Please describe.

Store derivations are currently in the ATerm format, which is at this point only used by Nix. Since ATerm is specific to Nix, Nix has its own pretty printer and other tools to support ATerm.

Describe the solution you'd like

Nix should represent store derivations as JSON or s-expressions (or some other common format) instead.

Backwards-compatibility may be tricky, but some special casing to detect ATerm vs JSON seems like it should work.

Describe alternatives you've considered

Some more radical rework of store derivations could also get rid of ATerm, but it's much easier to just switch to JSON.

We could embrace ATerm further; probably the first step would be to publish an actual publicly-accessible spec for the ATerm ASCII format that we use. It's unlikely anyone else will use ATerm, though.

catern commented 2 years ago

@puckipedia on Libera #nixos has mentioned that changing the format of store derivations will cause all Nix derivations to hash differently, and thus all derivations (even old, already-realised ones) will need to be rebuilt (except for fixed-output derivations).

One idea for working around that is to add a derivation attribute, say "__store_drv_format", which can be set to "json" to opt-in to the new format. Then some version of Nixpkgs could just start setting that attribute by default.

We might also talk to Guix people, because I think they've planned in the past to replace ATerm with s-exps, and maybe they've thought up a clever migration approach.

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

lambdadog commented 2 years ago

I'd very much like to see this. ATerm is an overly obscure format that, notably, doesn't even have any {de,}serialization libraries packaged in nixpkgs itself, at least as far as I've been able to find!

It makes .drv files overly opaque for a text format and frankly it's hard to find information on the format even with google.

flokli commented 2 years ago

Note there is the nix show-derivation command, which produces a JSON output.

I doubt ATerm will go away any time soon, especially considering they're deeply baked into how all the hashing methods.

You can find some documentation about the ATerm format here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.2195

On the other hand, go-nix recently added a parser for Derivations. I hope you find it useful, and as always, contributions welcome :-)

toraritte commented 1 year ago

What are the problems with ATerms? Does it have any technical flaws, such as it hinders the addition of certain features, makes processing slow/more expensive, etc.? I'm honestly curious.

One issue I can name right off the bat is that it's use by Nix is completely undocumented (except for Dolstra's PhD thesis).

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/why-was-aterms-chosen-for-the-format-of-store-derivations-instead-of-asn-1/27762/1

lambdadog commented 1 year ago

What are the problems with ATerms? Does it have any technical flaws, such as it hinders the addition of certain features, makes processing slow/more expensive, etc.? I'm honestly curious.

As far as actual problems with Nix as an opaque tool, I can't say that there are any I'm aware of. That said, it leads it being necessary to create libraries such as the Haskell nix-derivation or (often poorly maintained, if they even exist) ATerm libraries rather than simply using a more mainstream format's parsing library when creating tools that interoperate with Nix.

I suspect the author of nix-derivation may not even be aware the ATerm format is what Nix is using. The benefits would be being able to write tools such as nix-diff without having to jump through hoops because Nix uses an obscure serialization format that doesn't even come up when you search its name on Google, assuming you can even find its name in the first place since you'll have to read Dolstra's PhD thesis to find it.

I'm all for innovation in serialization formats, but Nix's usage of ATerm is at best novel, not innovative, given that it stifles creation of tools that interact with derivations and provides no benefit to Nix itself.

That said, it may be more painful than it is valuable to simply switch at this point. If nothing else it would create a lengthy "upgrade" process and would invalidate all hashes unless (likely slow) workarounds were created. Perhaps for a couple of versions Nix could both use ATerm hashes and (ex.) JSON hashes, preferring JSON hashes for new derivations but checking for ATerm ones in caches first, then after a notable portion of derivations in Nixpkgs were now hashes using JSON the ATerm support could be dropped and the remaining ATerm hashes could be invalidated.

It would be a slowdown (as conversion to ATerm would be required for every derivation hashing), but it may be worth it to use a format that's not unnecessarily obscure.

lambdadog commented 1 year ago

As @flokli mentioned, nix show-derivation can be used but, as an example, with a tool like nix-diff, which iterates through derivation inputs recursively, that would require shelling out for every single derivation encountered, which I suspect is why the nix-derivation library was created in the first place, to avoid such a heavy performance penalty.

l0b0 commented 8 months ago

Taking as an example the smallest .drv file on my system, /nix/store/y1k8vmb26nwhlir3c5zzwl5mdzbr1nwy-nixos.drv. If I manually pretty-print this file, it looks like this:

Derive(
  [
    (
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos",
      "",
      ""
    )
  ],
  [],
  [
    "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
  ],
  "builtin",
  "builtin:unpack-channel",
  [],
  [
    (
      "builder",
      "builtin:unpack-channel"
    ),
    (
      "channelName",
      "nixos"
    ),
    (
      "name",
      "nixos"
    ),
    (
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos"
    ),
    (
      "preferLocalBuild",
      "1"
    ),
    (
      "src",
      "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
    ),
    (
      "system",
      "builtin"
    )
  ]
)

It looks like the only thing necessary to make this a JSON file is to remove the Derive keyword and change the "tuples" into lists:

[
  [
    [
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos",
      "",
      ""
    ]
  ],
  [],
  [
    "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
  ],
  "builtin",
  "builtin:unpack-channel",
  [],
  [
    [
      "builder",
      "builtin:unpack-channel"
    ],
    [
      "channelName",
      "nixos"
    ],
    [
      "name",
      "nixos"
    ],
    [
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos"
    ],
    [
      "preferLocalBuild",
      "1"
    ],
    [
      "src",
      "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
    ],
    [
      "system",
      "builtin"
    ]
  ]
]

This brings up a few questions:

Is there any other syntax which needs to be supported?

Are ATerm parenthesis-delimited "tuples" meaningfully different from square bracket-delimited "lists"?

Would the Nix language itself be a good substitute for ATerm? It seems like a natural choice, basically "flattening" the Nix expressions into only static values. Based on nix derivation show /nix/store/y1k8vmb26nwhlir3c5zzwl5mdzbr1nwy-nixos.drv:

{
  args =  [];
  builder =  "builtin:unpack-channel";
  env =  {
    builder =  "builtin:unpack-channel";
    channelName =  "nixos";
    name =  "nixos";
    out =  "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos";
    preferLocalBuild =  "1";
    src =  "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz";
    system =  "builtin";
  };
  inputDrvs =  {};
  inputSrcs =  [
    "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
  ];
  name =  "nixos";
  outputs =  {
    out =  {
      path =  "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos";
    };
  };
  system =  "builtin";
}

This seems pretty nice. One less language to worry about, the result is pretty much self-documenting, and we get other advantages of Nixlang such as comments.

theoparis commented 2 months ago

I agree with @l0b0. It would be nice to use nix as the format for store derivations.

I'm writing a nix derivation builder in Rust and I want to be able to build existing .drv files. I tried to modify nix locally to use libexpr but I ended up a recursive meson dependency (libstore -> libexpr -> libstore). I'll probably just use json for the time being since I don't want to bother with parsing a separate aterm format that isn't widely used.