Crosscompilation docs confusing / Ideas for LLVM stdenv

aaronmondal commented 1 year ago

Problem

Either this is just my misunderstanding and/or different terminology in Nix, or the docs on crosscompilation here are wrong/misleading.

Proposal

I'm coming from the Bazel world where we also have have host, target and build (well, there it's called execution but whatever) platforms. The docs state that the host is the platform the target is run on. I believe this is incorrect. At least it misaligns with the terminology used in Bazel.

I believe it would be more appropriate to document crosscompilation something like this:

The host platform is the machine that invokes the build. For instance the developers laptop.
The build platform is the machine that executes the build. For instance a remote compile server.
The target platform is the machine that runs the final executable. For instance a mobile device.

For most users, host, build and executable will always be the same: We are building software for ourselves by ourselves, always on the same machine.

If host and target are the same, but build is different, we might be in a situation where for instance a developer requests a remote execution server to create an executable for them. For instance because they are on a low-end device and the remote build server is a high-end device.

If host and build are the same, but target is different, we might be in a situation where a developer builds a mobile app locally.

If build and target are the same, but host is different, we are likely building a tool that is part of a toolchain that runs on the build platform. For instance a compiler that is built using other, lower level compilers for later (or immediate) use by the build platform.

(IMO it is incorrect to state that the target platform is the most irrelevant platform. It is in fact the host platform that is least relevant to the build, since it is just the platform that says "hello build platform please give me executable X for target platform Y".)

I've encountered this because I think the cross compilation support in nixpkgs is a bit unintuitive. I feel like nixpkgs is mixing up platforms and configurations a bit and this might be a reason why e.g. the LLVM stdenvs are so hard to get working everywhere.

Maybe it would be worth considering a differentiation between platforms and toolchain configurations, similar to how Bazel does it. Every time I try to cross-compile something I have to build like half the world, which shouldn't be necessary because e.g. a Clang on x86-unknown-linux-gnu should be able to build libc++ for x64-unknown-linux-musl without first requiring a Clang built for musl. I might be doing something wrong, but my experiments with this would always first build a specific clang for musl, which should be unnecessary.

To me, attributes like platform.isDarwin make sense, but platform.useLLVM seems rather unintuitive to me: Whether we use LLVM as C++ toolchain for stdenv is part of a toolchain configuration and not really related to any platform. We can switch back and forth rather easily between Clang and GCC, i.e. we can choose whether we want to use LLVM or not, but we can't choose our operating system in the same way. We also can't choose our hardware with the same flexibility.

I hope this was not too verbose to read 😅 I kinda feel like the LLVM stdenv is running into the same issues that caused Bazel to deprecate crosstool and instead use Platforms/ToolchainTransitions/Configurations.

Checklist

[x] checked latest Nixpkgs manual (source) and latest NixOS manual (source)
[x] checked open documentation issues for possible duplicates (potentially related https://github.com/NixOS/nixpkgs/issues/28327 https://github.com/NixOS/nixpkgs/issues/106375)
[x] checked open documentation pull requests for possible solutions

cc @rrbutani

rrbutani commented 1 year ago

I'll have time to take another look at this later this weekend (and I just realized I forgot to post my reply to your other issue from months ago – sorry) but here are some quick thoughts:

I'm coming from the Bazel world where we also have have host, target and build (well, there it's called execution but whatever) platforms. The docs state that the host is the platform the target is run on. I believe this is incorrect. At least it misaligns with the terminology used in Bazel.

You are correct that there is a discrepancy. My understanding is:

For nixpkgs it's:

             build     → host      → target

For Bazel this lines up as:

 host      → execution → target

i.e. what Bazel calls the execution platform is analogous to what nixpkgs calls the build platform and bazel's target platform thus lines up with nixpkgs' host platform.

It's just "off by one"; i.e. for nixpkgs the final package set's build/host/target are "package oriented" whereas for Bazel it's more compiler oriented.

I know that for Bazel "host platform" generally means literally the platform the Bazel server and client are running on (for which nixpkgs has no direct equivalent because there isn't necessarily one machine nix derivations are being built on as part of a package being built; the machine doing nix evaluation expression would be the analogue but I can't think of any scenarios in which we'd need to name that platform specifically; even with IFD, nix will happily shell out to remote builders as is required by the system/features on a derivation).

I'm confident that this is modeled appropriately in nixpkgs but I'm having trouble articulating how it maps back to the Bazel world. I believe host != execution/host != target's use case is primarily RBE toolchains/knowing where to run binaries? Which you'd model in nixpkgs just like cross-compilation (the build tools for your executors themselves come from a package set instance whose host platform matches what they can run). If you can share a concrete example where host platform in Bazel is used in constraints maybe we can work through what the nixpkgs analogue would be.

Not sure if you've already seen them but I think the docs here have some useful examples of how this plays out for various kinds of dependencies.

In particular, note that in nixpkgs a compiler's target platform is the target it produces code for whereas its host platform is the platform the compiler runs on. For a package being compiled using the compiler these labels all shift: the package will use the compiler from the set of build packages (i.e. the nixpkgs instance where the package's buildPlatform is the instance's hostPlatform); the package's hostPlatform is the compiler's targetPlatform.

This notion of having multiple nixpkgs instances that progressively slide their way to the build/host/target triple of your final package set is explained here and elsewhere in the manual better than I can articulate 🙂

(IMO it is incorrect to state that the target platform is the most irrelevant platform. It is in fact the host platform that is least relevant to the build, since it is just the platform that says "hello build platform please give me executable X for target platform Y".)

I believe this is in reference to the fact that most packages don't have a "target platform". Compilers do, because they produce native code, but most applications/libraries do not.

Every time I try to cross-compile something I have to build like half the world, which shouldn't be necessary because e.g. a Clang on x86-unknown-linux-gnu should be able to build libc++ for x64-unknown-linux-musl without first requiring a Clang built for musl. I might be doing something wrong, but my experiments with this would always first build a specific clang for musl, which should be unnecessary.

Maybe this is confusion caused by the cc-wrapper package having clang and the target triple in the name? Is it actually rebuilding clang/LLVM?

We actually handle specifically this scenario as you're describing; LLVM and clang (native cross-compilers) are not recompiled when you ask for, i.e. crossSystem.config = { useLLVM = true; ... }. If you share your experiments I'm happy to help debug.

To me, attributes like platform.isDarwin make sense, but platform.useLLVM seems rather unintuitive to me: Whether we use LLVM as C++ toolchain for stdenv is part of a toolchain configuration and not really related to any platform. We can switch back and forth rather easily between Clang and GCC, i.e. we can choose whether we want to use LLVM or not, but we can't choose our operating system in the same way. We also can't choose our hardware with the same flexibility.

The idea behind .useLLVM is that it's a property of a nixpkgs instance; i.e. in an instance with hostPlatform.useLLVM, .stdenv is LLVM-backed. As you say, packages can (and do! chrome and firefox insist on LLVM for cross-lang LTO/PGO) pick other stdenvs.

I'll concede that "platform" is perhaps a bit misleading here (but system seems too narrow..).

I kinda feel like the LLVM stdenv is running into the same issues that caused Bazel to deprecate crosstool and instead use Platforms/ToolchainTransitions/Configurations.

I've encountered this because I think the cross compilation support in nixpkgs is a bit unintuitive. I feel like nixpkgs is mixing up platforms and configurations a bit and this might be a reason why e.g. the LLVM stdenvs are so hard to get working everywhere.

Having used cross-compilation setups in both Bazel and nixpkgs: there are definitely limitations to how cross in nixpkgs works (splicing isn't perfect, flakes conception of systems doesn't really mesh well with cross, etc) but on the whole I think it's actually modeled quite well; I think the big conceptual discrepancy here is the sliding scale of nixpkgs instances with different build/host/target platforms (in kind of the same way that Bazel transitions can modify the target platform for parts of your build graph/influence toolchain resolution).

Hopefully this provides some clarity or at least more things to read.

Definitely also feel free to ask questions in the NixOS matrix channels or to DM me; I'm very happy to answer questions and walk through examples.

Artturin commented 1 year ago

We're following the autoconf definitions https://www.gnu.org/software/autoconf/manual/autoconf-2.68/html_node/Specifying-Target-Triplets.html

aaronmondal commented 1 year ago

@rrbutani Thanks for your extensive answer! This cleared things up a lot for me. I also went over various Autotools-related docs on the topic and I now think I understand the difference. Regarding when rebuilds are triggered, I'm testing this ATM.

My actual use case is building RBE images with nix. At the moment the build caches of rules_ll invocations arent reproducible across machines. Roughly speaking my use case is building Bazel itself with Clang/LLVM and then creating a Container that contains that Bazel and a Clang/LLVM C++ toolchain so that we can use that as remote execution Docker image.

I'm close to getting it to work in a way that the Bazel remote execution toolchain 1:1 maps to a local nix dev environment. I believe that this actually makes it possible to build e.g. a cache up with CI and then pull artifacts from that cache for reuse with local, non-remote builds. Or with trusted parties, users could share caches across machines without the need for remote execution runners at all. Or something in between 😃

Almost works. Almost 😅 I'll let you know when this actually works and/or I have some code ready.

This is why I keep running into these complicated issues lol.

alyssais commented 1 year ago

I'm going to close this because the main question has been answered. Feel free to open new issues for any behavior you still think we should change.

NixOS / nixpkgs