chatnoir-eu / chatnoir-resiliparse

A robust web archive analytics toolkit
https://resiliparse.chatnoir.eu
Apache License 2.0
55 stars 9 forks source link

Enable build of linux-aarch64 wheels #34

Closed jonded94 closed 3 months ago

jonded94 commented 6 months ago

Currently, only Mac Intel/ARM, Windows x86_64 and Linux x86_64 wheels are being built in this repository.

It would be nice if Linux ARM wheels could also be available, for example:

For that I prepared this PR.

Disadvantages:

libc requirements

Because of

I was unable to get this to work with manylinux2014 and had to switch all the way to 2_28 since 2_24 is deprecated and EOL.

This obviously has the implication that the wheel no longer consumable by OS's that have a libc < 2_28. This should not be of concern for ARM machines at least since one could expect that they run on relatively new hardware/OS's anyways. Don't know how you feel about that though! One could of course build Linux x86 wheels on the older manylinux version, but that would slightly increase complexity in the Dockerfile.

CI times

I noticed a rough ~5x increase of CI times since all the ARM Wheel builds and tests are run through QEMU which is notoriously slow.

jonded94 commented 6 months ago

Note that I had to make changes to the Dockerfile, I pointed the Linux build containers that you use for now to ghcr.io/jonded94/resiliparse-manylinux_2_28_aarch64. If you are interested in merging this, this should be changed of course before that. The Dockerfiles we're built manually, proper jobs for building these are missing here.

EDIT: Solved with merged PR https://github.com/chatnoir-eu/chatnoir-resiliparse/pull/37

wumpus commented 6 months ago

This is a great idea. It so happens that the Common Crawl Foundation mostly uses linux ARM cloud machines because they are cheaper to rent. Thank you for this PR.

jonded94 commented 5 months ago

Hey :) What is the status on this PR?

Not having linux ARM wheels still blocks usage of FastWARC inside ARM-based AWS instances and native Docker containers on M1 Macs.

Is there anything I can do? Maybe skip testing of the ARM wheels, only build them through QEMU?

phoerious commented 5 months ago

Hi sorry, I've been too busy to look into it further. I would definitely skip the tests and ideally cross-compile natively. But we can also virtualise it for the time being.

jonded94 commented 3 months ago

Since I stumbled upon a weird issue (https://github.com/pypa/cibuildwheel/issues/1771) where cibuildwheel suddenly refuses to execute the new aarch64 manylinux container in the correct platform mode (but tries to do it in AMD64 and fails to find a fitting image), I had to separate this build job for now. But that seems to work for now. Note that I disabled testing for aarch64 as agreed on.

Note that build-asan is currently also blocked since I forgot to upgrade the libasan gcc toolset to version 12. I created a PR for that: https://github.com/chatnoir-eu/chatnoir-resiliparse/pull/38

jonded94 commented 3 months ago

All is done now; PR should be ready to merge :) Sorry for the long time that this took.

phoerious commented 3 months ago

The first aarch64 builds are up. Please test. I'm not happy with the build times (took more than an hour), so I've disabled builds for 3.8 and 3.9 for now and trigger the whole aarch64 job only on a tags/ ref.

jonded94 commented 3 months ago

Great to hear!

Some idea that I use currently at my company to circumvent the entire cross-compilation situation: One could use the Mac M-series runners and spawn the Manylinux ARM Linux docker container natively in them. I'm not too familiar with which settings one would have to overwrite in cibuildwheel to make that work (forcing Linux builds through OCI containers under MacOS); I usually just build the compilation-, auditwheel- and test-step manually anyways without using cibuildwheel.

That should reduce the build times back to <=5min I guess.

phoerious commented 3 months ago

That would probably be as easy as just running everything inside the Linux Docker image on the macOS-14 runner. A disadvantage would be that you cannot easily express that as a simple build matrix.

My last experiments with the native M1 runners failed (for other reasons), but it might be worth trying that again.