NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.01k stars 14.02k forks source link

Embedded target triplets (system, config) are used and interpreted incorrectly #165836

Open sternenseemann opened 2 years ago

sternenseemann commented 2 years ago

Describe the bug

The following writeup assumes that autotools is our source of truth with regards to target triplets. This is the case in nixpkgs and also implicit for Nix use in general, since its configure system is autoconf-based and produces the system strings in the first place.

When fighting binutils' autoconf to accept Solo5's new and custom aarch64-solo5-none triplet, I noticed a curious discrepancy between nixpkgs' lib and autotools:

> nix-instantiate --eval -E 'with import <nixpkgs> {}; (lib.systems.elaborate "riscv64-none-elf").config'
"riscv64-unknown-none-elf"
> $(nix-build '<nixpkgs>' -A gnu-config)/config.sub riscv64-none-elf
riscv64-none-elf
>  $(nix-build '<nixpkgs>' -A gnu-config)/config.sub riscv64-unknown-none-elf
Invalid configuration `riscv64-unknown-none-elf': Kernel `none' not known to work with OS `elf'.

The discrepancy can be boiled down to — after digging through the depths of config.sub for a bit — that autotools considers none a valid vendor, whereas for nixpkgs none can never be a vendor and will always be interpreted as a kernel.

When using lib.systems.examples, the problem is more subtle, since an explicit config argument is never elaborated:

> nix-instantiate --eval -E 'with import <nixpkgs> {}; (lib.systems.elaborate lib.systems.examples.riscv64-embedded).config' # correct
"riscv64-none-elf"
> nix-instantiate --eval -E 'with import <nixpkgs> {}; (lib.systems.elaborate lib.systems.examples.riscv64-embedded).parsed.vendor' # autotools believes this should be `none`
{ _type = "vendor"; name = "unknown"; }

So in conclusion, the following discrepancies between nixpkgs and autotools exists:

The biggest problem I ran into was none-elf:

In the case of all example triplets, this problem is obscured by the fact that nixpkgs and autotools interpret the triplet differently, with autotools taking none for the vendor. When trying to force a certain vendor for aarch64-solo5-none, things fell apart:

I think we were bound to run into some kind of problem at some point; The big problem with autotools is that (unless the triplet is in its most canonical form, i.e. of four components) you have to know what the possible values for each component are to expand the triplet correctly. Additionally, config.sub handles a couple of extra edge cases and backwards compatibility, resulting in a very intricate logic that's hard to replicate.

Another aspect is surely that we don't have a concept of multiple platform identifier systems, so all configs we use need to work with LLVM as well, since we have no concept of “convert this into an LLVM triple”. I need to research LLVM triples still, but possibly this could be a tamer system for us to use if we can figure out how to convert the triples to autotools target triplets correctly.

For the matter of the incorrect autotools triplets for the embedded platforms, I'm not sure how to proceed, actually. I think the first step would be to figure out where these triplets came from and what their correct autotools variant or interpretation would be:

cc @alyssais @Ericson2314 @jaykru (added riscv* embedded) @matthewbauer @vincrusher (added embedded platforms)

ghost commented 2 years ago

arm-none-eabi{,hf}

Do the autotools actually understand the hf in arm-*-eabihf, or are they just parroting it back? For example,

$ ./config.sub arm-linux-eabicrazypants
arm-unknown-linux-eabicrazypants

Grepping autoconf for eabihf the only mention I can find is specific to glibc on FreeBSD, which makes it seem like eabihf is a FreeBSD thing that autoconf is acknowledging rather than something that works on other kernels...

Ericson2314 commented 2 years ago

Yeah what we do is more like LLVM triples. I hope we can move away from 3 part to 4 part whenever possible, which helps avoid issues. If there is a 4 part which GNU config doesn't understand, we can submit a patch.

ghost commented 2 years ago

Yeah what we do is more like LLVM triples.

Is there a better reference/specification for "LLVM triples" than https://clang.llvm.org/docs/CrossCompilation.html#target-triple ? Like a complete list of the valid choices for each field? Or an explanation of how LLVM triples are different from autoconf triples?

It is a little disappointing that they didn't at least cite/acknowledge autoconf's taxonomy, which is very obviously the basis for (and perhaps even a superset of) LLVM's taxonomy and has something like two decades of priority...

sternenseemann commented 2 years ago

Is there a better reference/specification for "LLVM triples"

This is also something I have been asking myself (although to be fair in the case of autoconf the “documentation” is a frightening shell script), but haven't found anything so far. I only found this lightning talk, but sadly the proposed solution hasn't been adopted yet.

ghost commented 2 years ago

Holy cow this is so crazy.

If there is a 4 part which GNU config doesn't understand, we can submit a patch.

Yeah, aarch64-solo5-none-elf.

The problem is this case-block in config.sub. It is the only thing in all of config.sub that will do multi-field validation (i.e. possibly exit nonzero) when you submit a four-field string. All the other checks are confined to checking one of the four fields in isolation, or else are only used on three-or-less-component strings (i.e. disambiguation).

This kind of torpedoes my plans. I have deleted two long-winded comments I wrote earlier in this PR, since they are no longer worth reading. I had assumed that everything in config.sub was either disambiguation or single-field validation.

ghost commented 1 year ago

autotools won't accept this elf is not an OS that works with Kernel none to their knowledge.

This is true for nixpkgs as well after #235230.

Ericson2314 commented 1 year ago

The problem is this case-block in config.sub.

Yeah and guess who wrote that? :D

 $ git log -G 'validate the OS-kernel combination'
commit 5e531d391852a54e7fab2d8ff55625fca514b305
Author: John Ericson <john.ericson@obsidian.systems>
Date:   Sun Jun 28 15:32:14 2020 +1000

            * config.sub: Properly parse the KERNEL-OS case.

    I previously refactored config.sub to parse 'basic_machine' into
    separate 'cpu' and 'vendor' variables. This is a kindred refactor
    where 'basic_os' (a rename of 'os' when used early on) is now parsed
    into 'kernel' and 'os'.

    Like the previous refactoring, this does make things a bit longer. I
    think the change makes the code easier to understand and more robust,
    so it is worth the cost of a longer script.

    Signed-off-by: Ben Elliston <bje@gnu.org>

Lest you think that I am making everyone miserable, before that change it was:

 $ ./config.sub aarch64_be-unknown-none-elf
Invalid configuration `aarch64_be-unknown-none-elf': system `none-elf' not recognized

My change made it closer to parsing, since it was already whitelisting elf. Now we just need to finish the job.

Ericson2314 commented 1 year ago

https://lists.gnu.org/archive/html/config-patches/2023-07/msg00002.html This fixes the bug.

Ericson2314 commented 1 year ago

This is somewhat important, since autotools prefixes the OS with a modifier (the kernel) if it is given, whereas nixpkgs will postfix the kernel with a modifier (the ABI). This is probably the biggest divergence I could make out philosophically, although in practice I don't know of any cases where it produces problematic results.

Yes this is the big difference. We do it the LLVM way. GNU config is alone in the other way. I think the LLVM way is winning. I am trying to think of ways to teach GNU config the LLVM way while still passing existing tests.

TBH, I wouldn't be surprised if the difference came from linux-gnu being interpreted as GNU but Linux, or Linux + glibc :D.

ghost commented 1 year ago

TBH, I wouldn't be surprised if the difference came from linux-gnu being interpreted as GNU but Linux, or Linux + glibc :D.

I suspect so as well, and believe this was a very unhelpful decision by gnu-config.

However I also think it is not particularly helpful to present this as a "GCC vs LLVM" war, especially since LLVM does not have a taxonomy for use by non-LLVM projects... it simply has "whatever the latest LLVM does". Unless LLVM volunteers to take the burden of maintaining a stable external taxonomy it's an apples-and-oranges comparison.

Ericson2314 commented 1 year ago

https://lists.gnu.org/archive/html/config-patches/2023-08/msg00000.html The problem upstream should be solved now! $cpu-$vendor-none-{coff,elf} is accepted.