bazelbuild / platforms

Constraint values for specifying platforms and toolchains
Apache License 2.0
108 stars 74 forks source link

Constraints for ABI #38

Open UebelAndre opened 2 years ago

UebelAndre commented 2 years ago

Would it be possible to introduce @platforms//abi which contains constraints for common ABI definitions of platform triples?

In rules_rust we currently have issues with users wanting to target platforms with shared CPU and system and unique ABI but are unable to do so as there's no constraints uniquely identifying them. Users are forced to define custom constraints and redefine toolchains to get around errors caused by this ambiguity. I think it would be very beneficial to introduce ABI constraints here so that rules_rust and other rules can share constraints, enabling rule maintainers to appropriately constrain toolchains so users only have to setup platform definitions.

hlopko commented 2 years ago

Some old but related doc on the topic of more constraint settings: docs.google.com/document/d/1CgU-GKocMAfsUSI3bbGZ0YRkOWczitIoKs29x3zR914/edit#heading=h.3fbh1otqm5sz

bsilver8192 commented 2 years ago

ARM ABIs are really complicated... I'm not sure how to fit all of this into constraints. I'm sure representing all of this complexity is a bad idea, but I don't have much input into where to draw the line.

I've been pretty happy defining my own constraint_setting that just enumerates my hardware platforms. The only place I find standard constraints helpful is using other people's BUILD files, which only needs granularity to the extent those BUILD files do different things. Most of this information is only needed by the toolchain itself, but I have seen different assembly files to choose between which do need more of this to work for any platform.

Some platforms I've targeted with Bazel which I think are common:

Some less common platforms I've targeted with Bazel:

Some notes on aarch32 (ARMv6/ARMv7, or ARMv8 in 32-bit mode) variants I've dealt with:

UebelAndre commented 2 years ago

In hopes of moving the conversation along I've opened https://github.com/bazelbuild/platforms/pull/39 to provide a concrete example of what I'm hoping for.

aiuto commented 2 years ago

IMO, there is no winning in trying to precisely define ARM platforms in this repository. Virtually every device is its own platform, since there can be so much customization of what is on the die.

UebelAndre commented 2 years ago

For what it's worth, the only 3 values I'm really hoping to represent are gnu, musl, and msvc which would help tremendously with producing docker images containing Rust or Go binaries. I'm not too familiar with all the existing ABI variants but are those considered base values (thinking of things like gnueabi)?

aiuto commented 2 years ago

I would rather go small first, with very precise definitions of what each constant means. So what would those three things mean? I take ABI to strictly mean the calling convention between separately compiled modules. So, gnu is probably better called itaniam-C++. That should match clang too, so using the generic name is fair.

How is musl as an API different from gnu? Aren't we getting at features of the library, rather than the binary interface? Unless we are talking about different musls. It is also not clear what msvc means as an ABI standard. The calling conventions have changed in different VS versions, and the windows APIs have changed as well.

Also, by being strict about calling conventions, I am excluding anything like linker format, and features of the standard library. Those should be an orthogonal space. It's not clear from the issue description if the need is to cover all of those topics or just the ABI.

@UebelAndre, it would help to see examples of what the rust teams are actually trying to do. Can you point to any of the platform definitions they are building? Or, maybe some definitive docs on rust/c++ interoperability.

bsilver8192 commented 2 years ago

I would rather go small first, with very precise definitions of what each constant means.

I agree with these goals. I would like to add "no names that are prone to assuming an incorrect meaning" and "obvious nesting or mutual exclusion between all categories".

For example the current decision between //cpu:arm (all ARM? all aarch32? non-thumb aarch32?) vs //cpu:armv7 vs several sets of specific ARM cores denoted by their instruction set variants is really confusing. And is //cpu:arm64_32 aarch32-on-arm64 (code can use all the registers, but they're all caller-save and the ABI doesn't change) or is it the aarch64 version of x32? Even as somebody who knows what most of this means, I have no idea which of these are supposed to apply to my platforms.

I've got tons of specific things to bring up in the interest of avoiding something which confuses people thinking about just one of them. Also it'd be good to avoid precluding clean solutions to the rest of it in the future, but that might be hard.

So what would those three things mean? I take ABI to strictly mean the calling convention between separately compiled modules. So, gnu is probably better called itaniam-C++. That should match clang too, so using the generic name is fair.

Actually, coming from managing many C/C++ toolchains, gnu strikes me as the most problematic one on that list. For some triples, that means GNU OABI (which is ancient), vs gnueabi would be the GNU EABI that's used on all modern ARM. For other triples, on CPUs/platforms/etc which never used OABI, gnu means GNU EABI. Keep in mind that there might be some platforms with non-GNU EABI around too. Even if we document a consistent meaning for it within the BUILD file, people are going to misuse it because they assume it's obvious what it means (it's in my triple, that must be the one I want).

The itanium C++ ABI defines how C++ is implemented on top of common ELF ABIs. The name is because it was first defined for Itanium, but it's since become the de facto standard on all the common platforms (aarch32, aarch64, x86, amd64 are the ones I work with). But C++ ABI compatibility is also affected by (and this is just on Linux with libstdc++/libc++, Microsoft basically just breaks compatibility periodically instead):

Separate from the mess of C++, there's also things like:

They do fit under "separately compiled modules need to handle them in compatible ways", but I don't think it's manageable in a centralized list like this... I think managing any of this with bazel platforms is a low priority.

How is musl as an API different from gnu? Aren't we getting at features of the library, rather than the binary interface? Unless we are talking about different musls. It is also not clear what msvc means as an ABI standard. The calling conventions have changed in different VS versions, and the windows APIs have changed as well.

Things like the interface to the dynamic linker and the implementation of errno depend on which libc. Rust also cares about the stack unwinding support. TLS implementation can also be different for some libc.

@UebelAndre, it would help to see examples of what the rust teams are actually trying to do. Can you point to any of the platform definitions they are building? Or, maybe some definitive docs on rust/c++ interoperability.

I believe the goal is to enable a platform to select a unique target from all the ones Rust supports at https://doc.rust-lang.org/nightly/rustc/platform-support.html, so that rules_rust can generate toolchains for all of them and get the correct one picked based on the platform.

UebelAndre commented 2 years ago

@UebelAndre, it would help to see examples of what the rust teams are actually trying to do. Can you point to any of the platform definitions they are building? Or, maybe some definitive docs on rust/c++ interoperability.

@aiuto I've created https://github.com/bazelbuild/rules_rust/pull/1270

UebelAndre commented 2 years ago

@aiuto I've opened https://github.com/bazelbuild/rules_docker/pull/2062 to also show how other rules may benefit from a common set of definitions. A change like that would allow the rules_rust changes to work with rules_docker changes without requiring user patching or a large amount custom configuration.

Does this and https://github.com/bazelbuild/platforms/issues/38#issuecomment-1102808729 provide more helpful context?

UebelAndre commented 2 years ago

@aiuto @gregestren friendly ping 😅

gregestren commented 2 years ago

Ping noted. Will try to respond soon (maybe next chance @aiuto and I get to talk).

gregestren commented 2 years ago

I've schedule this with my next chat with @aiuto - we at least owe a proper response / next step for this issue. We're both OOO the next two weeks but we'll sync mid-July.

graywolf-at-work commented 2 years ago

Any update on this? I don't want to push too much but thought I would ask given the "mid-July" estimate.

gregestren commented 2 years ago

Fair comment. Apologies for delays. Scheduling again for next week, and I'll make sure we discuss.

gregestren commented 2 years ago

My summary of the above:

I don't want to say more yet since the relationship between ABI and OS and platform is subtle as clearly expressed above. Still on agenda for wider discussion this week.

bsilver8192 commented 2 years ago

Another tricky scenario to consider, courtesy of a @coffinmatician:

  1. C++ toolchain for armv7 soft-float ABI which doesn't emit floating point instructions
  2. C++ toolchain for armv7 soft-float ABI which does emit floating point instructions (GCC's -mfloat-abi=softfp)
  3. Rust toolchain for armv7 soft-float ABI which doesn't emit floating point instructions

(I suspect that in practice rustc can use -mfloat-abi=softfp or an equivalent, but this still applies if rules_rust doesn't provide a toolchain which uses that.)

All 3 of these toolchains can be fully ABI-compatible. Some processors can run code from any of them, other processors can only run code from 1 and 3. I don't think Bazel's current constraint system allows a good solution: if you build for a platform that requires soft-ABI-hard-instructions then there's no compatible Rust toolchain, but if you use soft-ABI-soft-instructions then you end up with a non-optimal C++ toolchain.

@coffinmatician thinks constraints should have a full SAT constraint optimizer to address this.

bsilver8192 commented 2 years ago

I did some more thinking, and I have a proposal: put the list of Rust triples in a constraint_setting(name = "triple") in rules_rust, and do the same for other toolchain ecosystems.

For example, a project that creates Clang toolchains would have its own constraint_setting, and another project that does GCC toolchains would have its own, and a third project for bare-metal GCC and Clang toolchains would a separate one.

I don't think any other solution is going to solve the problem cleanly. Even if @platforms gets some kind of "ABI" constraint, there's still going to be platforms that can't be distinguished. For example, looking through the Rust list, the first two tier 2 targets look problematic: aarch64-apple-ios vs aarch64-apple-ios-sim.

Also building C++ code with Clang sometimes uses subtly different triples, and GCC often uses different triples. Debian (and derivatives) also use different triples for multiarch (x86_64-linux-gnu vs x86_64-unknown-linux-gnu, or arm-linux-gnueabihf vs armv7a-unknown-linux-gnueabihf). Also with GCC and Clang it's fairly common to further modify the ABI beyond the triple with various flags, sometimes in ways that could be accomplished by changing the triple too (like ARMv6 vs ARMv7). In general I think that trying to canonicalize triples (or any parts of them) across ecosystems is not going to work, and is going to produce confusion when different ecosystems use different strings for the same meaning.

I think rules_rust can create selects that don't use this new triple constraint, unless multiple enabled Rust triples are only distinguishable that way. This means that for some set of "common" platforms enabled by default, the user doesn't need to add any of the triple values to their platform. But once a user does enable multiple Rust triples that can only be distinguished with this constraint, they will have to add the appropriate values to their platforms.

graywolf-at-work commented 2 years ago
* @graywolf-at-work what brought you to this issue? Are you affected by this?

I'm running bazel on alpine linux (so on musl), meaning basically any pre-built binaries do not work. Currently I'm patching upstream projects and adding constraint on glibc or musl (using [0]) so that toolchain selection works properly, but that is obviously something I cannot contribute to the upstreams. So I would like to see a standard solution that would allow proper toolchain selection based on libc (so pre-built can be glibc, and built-from-source can be musl for example) to work.

0: https://git.sr.ht/~graywolf/x_platforms/tree/master/item/abi/BUILD

UebelAndre commented 2 years ago

put the list of Rust triples in a constraint_setting(name = "triple") in rules_rust, and do the same for other toolchain ecosystems.

I don't think I understand why this would be acceptable. I think rules_rust correctly translates a triple into a collection of constraints and isn't trying to treat triples as anything other than an alias. The addition of ABI constraints isn't going to solve all problems, it's just one more set of constraints that can be used to make platforms more unique and solve musl vs gnu issues I've run into a lot.

gregestren commented 2 years ago

We discussed this week. @aiuto has more input.

mattyclarkson commented 2 years ago

We're setting up C/C++ toolchains within our company and hitting the same issue.

We hermetically download a compiler, binutils and sysroot. That combination needs to be described via constraints so toolchain selection can occur correctly.

Rather than ABI (which can be affected by many things), we were just thinking more around the libc contraints:

@platforms//libc/gnu:2.31
@platforms//libc/musl:1.21

It's important to know the version because they are often forwards but not backwards compatible. It likely makes sense to have the non-versioned contraints also and that a select can occur on both the versioned and non-versioned contraint.

The other thing to know is the version of the system headers because deploying the binary can mean the libc uses system calls that are not available on older kernels.

So something like:

@platforms//os/linux:5.13
lberki commented 1 year ago

@mattyclarkson do the constraints need to be in the @platforms repository for your use case?

I don't think having a separate constraint for each Linux version is a great idea because it would (in principle) make Bazel version incompatible with any Linux version released after it, but if you control your own platform and constraint definitions, you know which exact versions of Linux / libc / etc. you care about.