GitoxideLabs / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
8.91k stars 303 forks source link

Prebuilt binaries obtained on ARM64 macOS are x86-64 #1478

Closed EliahKagan closed 2 months ago

EliahKagan commented 2 months ago

Tasks:

Current behavior 😯

The release workflow builds binaries for x86_64-apple-darwin but not for aarch64-apple-darwin.

An ARM64 (AArch64) macOS system is capable of running x86-64 macOS binaries by emulation, and will do so automatically, so it is easy to run the x86-64 builds on an ARM64 Mac. But in view of how performance is a key benefit of gitoxide, users would most often not prefer to do this, since the instruction sets are not compatible, so emulation imposes a significant overhead. Most importantly, users would most often not want to inadvertently run an x86-64 build on ARM64.

Unfortunately, this is likely to happen when using cargo binstall:

It is really the indirect case where I believe users likely to miss that this is installing an x86-64 binary on their ARM64 system. This is because:

This can be fixed for future releases by adding aarch64-apple-darwin jobs to the release workflow. That has further benefits, since even users who are not being inadvertently misled into using the wrong build will have another way, besides cargo quickinstall, to install same-architecture binaries.

cargo quickinstall does succeed at obtaining a native aarch64-apple-darwin build, since there is such a build among the cargo quickinstall releases. Current versions of cargo binstall are capable of installing those. But that does not happen here with cargo binstall, since cargo binstall apparently prefers the gitoxide releases even though their architecture is not as good of a match.

Unless something gets in the way, I will try to do this in my next PR that modifies the workflow, and I may fix by itself or with limited other changes rather than combining it with fixes for #1477.

An analogous situation exists for Windows

The same problem happens on an ARM64 Windows system, which also supports x86-64 executables by emulation. On such a system, one would ordinarily wish to run a aarch64-pc-windows-msvc build, but cargo binstall provides a x86_64-pc-windows-msvc build instead. I expect that this is less often encountered in practice so far, in part because I don't think ARM64 Windows systems are are common yet as ARM64 macOS systems.

Fortunately, it should be feasible to add both at the same time. Unlike on the Ubuntu runner where the tools to perform cross-compilation without cross would have to be installed explicitly (such as via sudo apt install ...) and further steps might be needed to install some libraries or to enable pkg-config to find them, on the macOS and Windows runners both targets have the Rust-nonspecific tools for targeting both architectures already.

Linux-based systems

Something like this may happen for Linux-based systems. I haven't investigated that at this time, because I wouldn't be immediately fixing it for those systems, as #1477 should probably be fixed before adding new Linux targets.

Other systems

At this time, we don't make any binary builds for targets outside macOS, Windows, and Linux-based systems. So users of such systems are not likely to be affected.

Expected behavior 🤔

Since macOS is increasingly run on ARM64 processors, one may expect aarch64-apple-darwin binaries to be available. But the reason I wanted to open this issue and ensure this is not forgotten even if I have trouble adding the target is two other expectations that I think users hold more strongly and implicitly:

To an extent the reason these expectations are not satisfied is due to aspects of the design of some other projects that could perhaps also be improved. For example, cargo install-update could be changed to warn or even prompt when it would upgrade to a different architecture, or cargo binstall could check for better architecture matches in the cargo-quickinstall releases even if an installable match has already been found.

However, for gitoxide, it should be feasible and maybe even easy to fix this by also shipping ARM64 macOS releases.

Regarding Windows: In this section, I didn't add anything about expectations for Windows ARM64 targets, since as noted above, I don't think the expectations about that are as strong, though I do believe we should provide such builds.

Regarding Linux-based systems: Users may be less likely to assume suitable builds are available, because most feature builds are not currently available (also something that #1477 will work toward).

Git behavior

I think this is mostly not applicable, because as far as I know, the upstream Git project does not build official binaries. However:

Downstream git builds for macOS

macOS does have git (or maybe the Apple developer tools install it, I'm not sure). That binary carries both x86-64 and ARM64 code:

~> git version
git version 2.39.3 (Apple Git-146)
~> type git
git is /usr/bin/git
~> file (command -v git)
/usr/bin/git: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/git (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/git (for architecture arm64e): Mach-O 64-bit executable arm64e

(The ARM64 code, at least on that particular macOS 14.5 system, is arm64e rather than arm64, but I think that does not have significant performance implications, and that, even though there are many different ARM architectures, arm64 vs. arm64e is an ABI difference rather than being different architectures.)

Although this is a relevant comparison, I mainly present the above to show that the file command on macOS is capable of revealing when a binary contains code of both architectures, so it's clear that its output, as shown below, really does demonstrate the issue.

Git for Windows

Git for Windows does not yet provide ARM64 builds, so the x86-64 version of Git for Windows really is the best available target for an ARM64 Windows system (unless one is willing to use experimental builds, which in any case I don't think are provided as binaries).

This is one of the current benefits of gitoxide and one of the reasons I think it's useful to make it as easy as possible to install ARM64 binaries on Windows. While gitoxide has performance benefits for some of its functionality that overlaps with that of git, such as for cloning, the performance benefit is far greater on an ARM64 Windows system where Git for Windows requires emulation but gitoxide does not. (Of course, Git for Windows native ARM64 builds for Windows will be coming; this is not likely to be a long-standing difference.)

Steps to reproduce 🕹

I used a macOS 14.5 system running on an ARM64 (AArch64) processor, and nothing was set up in a way that would itself lead to an x86-64 build being seen as preferred:

~> sw_vers -productVersion
14.5
~> uname -m
arm64
~> rustup toolchain list
stable-aarch64-apple-darwin (default)
~> rustup target list --installed
aarch64-apple-darwin

Installing with cargo binstall gave x86-64, not ARM64, binaries:

~> cargo binstall gitoxide
 INFO resolve: Resolving package: 'gitoxide'
 WARN The package gitoxide v0.37.0 (x86_64-apple-darwin) has been downloaded from github.com
 INFO This will install the following binaries:
 INFO   - ein (ein -> /Users/ek/.cargo/bin/ein)
 INFO   - gix (gix -> /Users/ek/.cargo/bin/gix)
Do you wish to continue? yes/[no]
? yes
 INFO Installing binaries...
 INFO Done in 14.699131666s
~> file (command -v gix ein)
/Users/ek/.cargo/bin/gix: Mach-O 64-bit executable x86_64
/Users/ek/.cargo/bin/ein: Mach-O 64-bit executable x86_64

(This was in fish. To do this in zsh, bash, or another Bourne-style shell, I would use a $ before the opening parenthesis of the command substitution.)

I uninstalled, then installed the previous version from source, then upgraded with cargo install-upgrade, showing that the result, due to its use of cargo-binstall, was to upgrade to a x86-64 build:

~> cargo uninstall gitoxide
    Removing /Users/ek/.cargo/bin/ein
    Removing /Users/ek/.cargo/bin/gix
~> cargo install gitoxide@0.36.0 --locked --quiet
~> cargo install-update --all
    Polling registry 'https://index.crates.io/'.....

Package             Installed  Latest   Needs update
gitoxide            v0.36.0    v0.37.0  Yes
cargo-binstall      v1.8.0     v1.8.0   No
cargo-nextest       v0.9.72    v0.9.72  No
cargo-quickinstall  v0.2.10    v0.2.10  No
cargo-update        v13.4.0    v13.4.0  No

Updating gitoxide
 INFO resolve: Resolving package: 'gitoxide'
 WARN The package gitoxide v0.37.0 (x86_64-apple-darwin) has been downloaded from github.com
 INFO This will install the following binaries:
 INFO   - ein (ein -> /Users/ek/.cargo/bin/ein)
 INFO   - gix (gix -> /Users/ek/.cargo/bin/gix)
 INFO Installing binaries...
 INFO Done in 3.101160291s

Updated 1 package.
Overall updated 1 package: gitoxide.
~> file (command -v gix ein)
/Users/ek/.cargo/bin/gix: Mach-O 64-bit executable x86_64
/Users/ek/.cargo/bin/ein: Mach-O 64-bit executable x86_64

I uninstalled again, and installed the current version from source for contrast, showing that gives ARM64:

~> cargo uninstall gitoxide
    Removing /Users/ek/.cargo/bin/ein
    Removing /Users/ek/.cargo/bin/gix
~> cargo install gitoxide --quiet
~> file (command -v gix ein)
/Users/ek/.cargo/bin/gix: Mach-O 64-bit executable arm64
/Users/ek/.cargo/bin/ein: Mach-O 64-bit executable arm64

And again, showing that cargo quickinstall gives ARM64 as well:

~> cargo quickinstall gitoxide
Calling `cargo-binstall` to do the install
 INFO resolve: Resolving package: 'gitoxide'
 INFO resolve: Verified signature for package 'gitoxide-0.37.0-aarch64-apple-darwin': timestamp:1722141643      file:gitoxide-0.37.0-aarch64-apple-darwin.tar.gz        hashed
 WARN The package gitoxide v0.37.0 (aarch64-apple-darwin) has been downloaded from third-party source QuickInstall
 INFO This will install the following binaries:
 INFO   - ein (ein -> /Users/ek/.cargo/bin/ein)
 INFO   - gix (gix -> /Users/ek/.cargo/bin/gix)
 INFO Installing binaries...
 INFO Done in 2.783621666s
~> file (command -v gix ein)
/Users/ek/.cargo/bin/gix: Mach-O 64-bit executable arm64
/Users/ek/.cargo/bin/ein: Mach-O 64-bit executable arm64

Windows: On an ARM64 Windows 11 system with rustup installed via https://win.rustup.rs/aarch64 (the Rust site and the rustup site don't yet link to it, but it's official), and with cargo-binstall installed with this technique, installing gitoxide using cargo binstall installs the x86-64 version:

PS C:\Users\parnassus> cargo uninstall gitoxide
    Removing C:\Users\parnassus\.cargo\bin\ein.exe
    Removing C:\Users\parnassus\.cargo\bin\gix.exe
PS C:\Users\parnassus> cargo binstall gitoxide
 INFO resolve: Resolving package: 'gitoxide'
 WARN The package gitoxide v0.37.0 (x86_64-pc-windows-msvc) has been downloaded from github.com
 INFO This will install the following binaries:
 INFO   - ein.exe (ein.exe -> C:\Users\parnassus\.cargo\bin\ein.exe)
 INFO   - gix.exe (gix.exe -> C:\Users\parnassus\.cargo\bin\gix.exe)
Do you wish to continue? yes/[no]
? yes
 INFO Installing binaries...
 INFO Done in 13.2124373s
PS C:\Users\parnassus> &'C:\Program Files\Git\usr\bin\file.exe' (gcm gix).path
C:\Users\parnassus\.cargo\bin\gix.exe: PE32+ executable (console) x86-64, for MS Windows, 5 sections
PS C:\Users\parnassus> &'C:\Program Files\Git\usr\bin\file.exe' (gcm ein).path
C:\Users\parnassus\.cargo\bin\ein.exe: PE32+ executable (console) x86-64, for MS Windows, 5 sections
PS C:\Users\parnassus> cargo uninstall gitoxide
    Removing C:\Users\parnassus\.cargo\bin\ein.exe
    Removing C:\Users\parnassus\.cargo\bin\gix.exe
NobodyXu commented 2 months ago

I recommend to also create a universal-apple-darwin one (containing both aarch64 and x86):

    lipo -create -output /path/to/dest /path/to/x86 /path/to/arm

There's a llvm version of lipo as well https://llvm.org/docs/CommandGuide/llvm-lipo.html

And I'd recommend using macos-14 runner, that one uses M1

EliahKagan commented 2 months ago

I've edited this to add a task list that includes what I've done so far, which I may open a PR for soon to make sure the core issue can be resolved before the next release, a well as things that still remain to be done, including making archives with universal binaries.

I am thinking that, for adding release archives with universal binaries, the existing design of the workflow could actually be kept, at least for now, and another job definition could be added for universal-apple-darwin that is parameterized only by feature (such as max), whose jobs depend on both the x86_64-apple-darwin and aarch64-apple-darwin jobs of the same feature. Those jobs could pull the binaries out of the already-made archives, run lipo on them, and make and upload a new archive.

I may look into alternative approaches before proceeding with that. But a possible benefit of that approach is that it doesn't need anywhere to store the binaries other than associated with the release. Storing them as artifacts is something I think I would most be interested in if it turns out to be simpler or clearer, or if other uses of it arise (for example, if we want to build the archives before publishing the GitHub release).

I found that an analogous, though in practice probably less misleading, situation applies on ARM64 Windows systems. So I've also expanded the issue description here to cover that.

And I'd recommend using macos-14 runner, that one uses M1

Since the beginning of this month, macos-latest is now equivalent to macos-14. In the few months before that, it was a staged rollout so it was sometimes macos-12 and other times macos-14, but I believe that is no longer the case. (It was never equivalent to macos-13, though that is and was available.) See https://github.com/github/roadmap/issues/926. So I don't think macos-latest should be changed to macos-14 in release.yml.

This applies even to the most recently run release workflow, as can be seen here, though at that time it was using cross and I'm unsure if there is any performance impact of that on macOS. In any case, cross is not needed to build gitoxide with both targets and all features on an ARM64 Mac. The release workflow no longer uses cross outside of Linux since #1475, but that is more recent than the current latest release.

EliahKagan commented 2 months ago

I've opened #1479 which, in accordance with the plan in https://github.com/Byron/gitoxide/issues/1478#issuecomment-2257584083, does enough to resolve the core issue but which does not add universal binaries. As noted there, I am thinking that could be done in a separate PR, but if preferred then that PR could also be delayed and expanded in scope.

EliahKagan commented 2 months ago

I have verified that no similar problem appears to affect Linux-based systems (though it would of course also be valuable to release binaries for more such targets).

The perspective I approach this from is:

On a Linux-based ARM64 system, there are two important cases to cover: x86-64 binaries, and 32-bit ARM binaries.

1. Could x86-64 binaries be used on ARM64 Linux?

Unlike macOS and Windows, a Linux-based ARM64 system is not expected to automatically be set up to run x86-64 binaries through emulation. To the best of my knowledge, no such systems ship with this enabled. However, they are capable of doing it, using QEMU to run the code and binfmt_misc to cause execution of the binary to load QEMU to do it. There are semi-automated means of making that easy, and on some systems it requires so little configuration that a user might do so without being aware of the effect. For example, on an ARM64 system running Ubuntu 24.04 LTS, I found that running sudo apt install qemu-user-static was enough to allow me to run x86-64 binaries.

Fortunately, cargo binstall still does not select the x86-64 build. The installed targets do not affect this, with it continuing not to select the x86-64 build even with its target among those that are installed.

ek@ubuntu-arm:~$ rustup toolchain list
stable-aarch64-unknown-linux-gnu (default)
ek@ubuntu-arm:~$ rustup target list --installed
aarch64-unknown-linux-gnu
aarch64-unknown-linux-musl
x86_64-unknown-linux-musl

That is on Ubuntu 24.04 LTS arm64. cargo binstall instead just offers to install from source as if by cargo install.

2. Could 32-bit ARM binaries be used on ARM64 Linux?

Many ARM64 processors do support running 32-bit ARM binaries, including armhf, without emulation. arm-unknown-linux-gnueabihf builds such as gitoxide-max-pure-v0.37.0-arm-unknown-linux-gnueabihf.tar.gz could therefore run on such a system. It would run at native speed. But it would not benefit from the ARM64 instruction set, so I suspect this would result in modestly worse performance.

In practice this would often fail, because unlike musl targets, unknown-linux-gnueabihf requires armhf libraries--at least libc--that would need to be installed on the system. But those libraries are available on common systems such as Ubuntu and may be installed as dependencies of some other software.

Fortunately, cargo binstall does not automatically install this version on an ARM64 system with an aarch64 toolchain, whether or not the arm-unknown-linux-gnueabihf target is installed.

ek@numbat:~$ rustup target list --installed
aarch64-unknown-linux-gnu
arm-unknown-linux-gnueabihf

That is on another Ubuntu 24.04 LTS arm64 system that I made sure is actually capable of executing 32-bit ARM binaries without emulation, since that is relevant to this particular test and since some ARM64 hardware is not capable of this. I installed the necessary system libraries to be able to run a manually downloaded and untarred arm-unknown-linux-gnueabihf build of gitoxide.

On this system, cargo binstall fortunately still does not attempt to fall back to the gnueabihf target, but instead offers to install from source as if by cargo install.

EliahKagan commented 2 months ago

I've opened #1486, which adds jobs to generate and include archives with universal binaries in a release, among other changes.