Open nvisser opened 4 years ago
Are there specific distributions for which you'd like this?
I'd like to see it for Armbian. Armbian seems to be super useful when installing to pine64's. It'd also be handy for building an aarch64 docker container for similar purposes.
I'd use this too, I've switched my Raspberry Pis over to plain Debian and the aarch64 kernel (arm64 Debian package flavor). If there's anything I can do to help with this let me know.
I understand that https://drone.io/ has arm64 CI runners. Perhaps it makes sense to delegate arm64 builds for a few distributions to them, as they have ARM64 hardware. We do not, which means our arm builds (like the Raspbian builds we do today) run under QEMU which is very slow.
I'd welcome a PR for doing this with Drone as an experiment :)
Travis-CI has aarch64 support available in beta form: https://docs.travis-ci.com/user/multi-cpu-architectures/
Using that will be tremendously easier than trying to use any other CI platform, given how much work has gone into build-travis.sh
.
Travis-CI can now build on Graviton2/arm64 instances: https://blog.travis-ci.com/2020-09-11-arm-on-aws There's a program to get credits for open-source projects on AWS: https://pages.awscloud.com/AWS-Credits-for-Open-Source-Projects , supporting such a build pipeline would be a perfect use-case.
If a PR would help to get that bootstrapped, please let me know.
I took a look at the pipeline for building packages on Travis-Ci, and it's definitely non-trivial. Most importantly it has an assumption of just one architecture, so adding a second one will require teaching which target distributions are supported on each architecture in the build matrix. At a minimum CentOS 6 and Ubuntu 16.04 are unlikely to be usable on aarch64 (or even desired).
I took a look at the pipeline for building packages on Travis-Ci, and it's definitely non-trivial. Most importantly it has an assumption of just one architecture, so adding a second one will require teaching which target distributions are supported on each architecture in the build matrix.
We are planning to look at GH actions (no promises though), perhaps this might make things easy
I don't believe GitHub Actions offers native aarch64 builds at this time; it can be done via QEMU, but Travis-CI offers native builds.
I've taken a look at the build process, and it seems relatively understandable (although it's fairly complex!). The one thing I can't seem to find is the script which actually iterates over all the targets in builder-support/Dockerfiles and executes the builds (and then extracts the resulting packages from the images). This script will need to understand the list of targets which are available on each architecture.
If you init & update git submodules, builder/build.sh
appears. It is called like builder/build.sh centos-7
or builder/build.sh -m authoritative debian-buster
etc.
It probably makes sense to hardcode the list of aarch64 targets (probably way shorter than our full list) instead of trying to iterate over a directory listing.
Yep, I found that part, I was just wondering what actually invokes build.sh
for each of the available targets :-) It doesn't seem to be in the .travis.yml
or .circleci
configurations. The tool that is doing the iterating is the one that will have to be taught which distro/arch combinations are valid.
We don't build packages on Travis or CircleCI currently. The only place -we- call build.sh is in the configs for https://builder.powerdns.com/, so it makes sense that you could not find that :)
In other words, you'll have to write that five line shell script.
Got it... then I can propose a new script which is aware of the host architecture and chooses the targets to build for it, and you can decide how to integrate that into the real build process.
Yes - I see two open questions there: (1) where do we put the packages after Travis has built them (2) do we trust Travis enough to make these packages 'official' and sign them with a pdns key
By the way, we have no need (or use) for a script around build.sh for amd64, because we list those targets in our buildbot config already. So feel free to underengineer the script.
Another option is to leverage the "AWS for Open Source" link above and get AWS aarch64 compute resources that builder.powerdns.com can use.
In either case I can get to the point where I can prove that the existing build processes run properly on an aarch64 machine and produce usable packages for most of the distros that are in the list today.
Oh! I missed that comment! Indeed that would also make sense, but we wouldn't get to it soon. Getting packages out of Travis would still be a great start.
Well, at least some simple testing produced good results:
master
branch and initialized/updated submodulebuilder/build.sh -m recursor debian-buster
The build ran to completion with no errors visible. One small issue: the default build uses only one CPU, which is somewhat annoying when you are running builds manually :-) Adding an appropriate DEB_BUILD_OPTIONS
in build-debs.sh
solves that problem, although based on the debhelper documentation that should not be necessary... This could be an issue if the jobs are run in Travis-CI because they'll use more wall-clock time than necessary, and the jobs may hit the maximum time limit.
Unsurprisingly, builder/build.sh -m recursor raspbian-buster
also works just fine, and of course no QEMU is required.
Unsurprisingly,
builder/build.sh -m recursor raspbian-buster
also works just fine, and of course no QEMU is required.
That is not entirely unsurprising! When we tested on aarch64 last year, we had a box with -no- arm 32 bit support. So this is excellent news!
One small issue: the default build uses only one CPU, which is somewhat annoying when you are running builds manually
I noticed the same last year, but I did not dig in to find the -right- solution.
One small issue: the default build uses only one CPU, which is somewhat annoying when you are running builds manually
I noticed the same last year, but I did not dig in to find the -right- solution.
My hack of a fix was to set DEB_BUILD_OPTIONS='parallel=4'
in the script line which calls fakeroot
. Clearly this is suboptimal as it should be configurable (or at least default to the number of CPU cores on the build machine), and based on the debhelper documentation shouldn't even be necessary as if debian/compat
is set to 10 or higher (it is for recursor and dnsdist, but not yet for authoritative) and the version requirement for debhelper
is set to 10 or higher in debian/control
(it is for recursor and dnsdist, but not yet for authoritative), then parallel building is supposed to be the default.
I've got some debhelper-knowledgeable colleagues at $dayjob so I'll ask them for guidance on that front.
First build failure: building the authoritative packages from the 4.3.0 tag produced some test failures.
testrunner: ../ext/luawrapper/include/LuaContext.hpp:107: LuaContext::LuaContext(bool)::<lambda(lua_State*)>: Assertion `false && "lua_atpanic triggered"' failed.
unknown location(0): fatal error: in "lua_auth4_cc/test_prequery": signal: SIGABRT (application abort requested)
test-lua_auth4_cc.cc(20): last checkpoint: "test_prequery" test entry
testrunner: ../ext/luawrapper/include/LuaContext.hpp:107: LuaContext::LuaContext(bool)::<lambda(lua_State*)>: Assertion `false && "lua_atpanic triggered"' failed.
unknown location(0): fatal error: in "lua_auth4_cc/test_updatePolicy": signal: SIGABRT (application abort requested)
test-lua_auth4_cc.cc(47): last checkpoint: "test_updatePolicy" test entry
ah yes, luajit is broken on aarch64. We have workarounds in https://github.com/PowerDNS/pdns/pull/6512 but they are not acceptable for general consumption (i.e. they might create slowdowns for other architectures).
Cleanest would probably be to build against lua 5.3 instead.
Confirmed; switching to liblua5.3 allows the build to complete and the tests to pass. This means we'll end up having different Debian configuration files (at least) for amd64 and aarch64 I suppose.
I've also apparently succeeded in getting parallel builds to work using the documented mechanism (at least for versions of Debian which support debhelper 10.x and higher), but I'll not yet claim success there until I've tested it with dnsdist and recursor too :-)
This means we'll end up having different Debian configuration files (at least) for amd64 and aarch64 I suppose.
I'm sure we can do something more clever than that :)
This means we'll end up having different Debian configuration files (at least) for amd64 and aarch64 I suppose.
I'm sure we can do something more clever than that :)
Indeed, I've got this working now, where luajit is used for amd64, and lua5.3 is used for non-amd64. This could be changed to use luajit on non-arm64, and lua5.3 on arm64, quite easily.
Current status:
Two test machines -
With a small set of changes in the builder-support
tree, these are the results of builds for various distributions.
I did not test any older distros because they are either past their EoL or do not have arm64 support.
At this point the only distro where arm64 fails but amd64 succeeds is CentOS 7, so I'll try to figure out the cause of that. After that I'll send a PR with the various changes to the builder-support tree.
fails on arm64 with an error about finding Boost context library
When boost::context is not available or usable we are supposed to fall back to ucontext, so we likely have a detection issue here.
Amazon Linux 2 - fails on both
You can ignore this one.
Debian + Raspbian stretch can probably also go away from the list, as they are more or less EoL.
fails on arm64 with an error about finding Boost context library
When boost::context is not available or usable we are supposed to fall back to ucontext, so we likely have a detection issue here.
As @Habbie pointed out to me, dnsdist doesn't actually use context
. I don't see where it's configure would check for boost::context either.
This build failure is for recursor, not dnsdist. I'm building all three in these tests, not just dnsdist.
OK... here's the issue. With the version of Boost in CentOS 7, boost::context is installed and is a version sufficiently high for the configure.ac script to want to use it, but building a program on aarch64 results in an error that the 'platform is not supported'. This happens too late for the configure script to fall back to ucontexts.
We could identify the version of Boost where aarch64 is supported in boost::context and increase the minimum version required in the configure script.
It looks like aarch64 support in boost::context was added in Boost 1.61, which was released more than four years ago. I'll test with the requirement changed to 1.61 to force a fallback to ucontext unless Boost is at least that version.
1.61 appears right; Debian carried a patch for it in 1.58 to 1.60.
EPEL7 ships boost 1.69, we could do the same build-trick we did for EL6 (using EPEL boost) for aarch64 on EL7 as well....
Thankfully the RPM build process doesn't need 'convincing' to use all four cores like the DPKG build process did :-)
It appears that only recursordist
suffers from this problem; pdns
also uses boost::context
, but uses different logic for selecting when to use it or not, and apparently chooses not to use it on CentOS 7 aarch64.
With the minimum version set to 1.61, I now have a successful build of all three on CentOS 7 aarch64; if someone wants to point me to the 'EPEL trick' I can try to apply it here so that all CentOS 7 packages use boost::context. It would be weird if the amd64 builds used boost::context and the aarch64 builds used ucontexts...
It appears that only
recursordist
suffers from this problem;pdns
also usesboost::context
, but uses different logic for selecting when to use it or not, and apparently chooses not to use it on CentOS 7 aarch64.
pdns uses boost, but not boost::context. The files are a bit mixed in the repo (hence the symlink party in recursordist
and dnsdistdist
).
With the minimum version set to 1.61, I now have a successful build of all three on CentOS 7 aarch64; if someone wants to point me to the 'EPEL trick' I can try to apply it here so that all CentOS 7 packages use boost::context. It would be weird if the amd64 builds used boost::context and the aarch64 builds used ucontexts...
Amazing job kpfleming. For information CentOS 7 / aarch64 isn't supported on Neoverse N1 based aarch64 processors (including Graviton2), and there's no plan to backport support for it. It runs, but there are several issues that aren't fixed and won't be. Starting with RHEL8.2 / CentOS8.2, everything is fine.
I'm very much okay with opening pdns up to aarch64 only on the most current distributions.
If I can do this without too much effort I'll continue with the CentOS 7 support, as there are weird people running such systems on non-Neoverse processors :)
Yes, of course there are ! And it is a good thing. Just wanting to be sure that none would read that and decide to run pdns on CentOS7/neoverse N1 as a consequence and hit some issues.
Ahh, well... hmm. The EPEL page says that EPEL-7 is no longer available for aarch64. That means we either drop CentOS 7 from the aarch64 package list, or we allow the fallback to ucontexts.
At this point, I think we could just go with this distro list for aarch64:
If the AWS crew want to work on support for Amazon Linux 2 they are certainly welcome to do so!
The aarch64 builders can also be used for the Raspbian Stretch and Buster packages, but those produce armhf packages, so aarch64 is not a concern there.
If this plan works for the maintainers, I'll start cleaning up my branch of changes for the builder-support tree and get a PR opened.
If this plan works for the maintainers, I'll start cleaning up my branch of changes for the builder-support tree and get a PR opened.
Yes please!
Alpinelinux is building dnsdist for 7 arches https://pkgs.alpinelinux.org/packages?name=dnsdist&branch=edge
test-suite.log {{{ 1 ===================================== 2 dnsdist 1.5.1: ./test-suite.log 3 ===================================== 4 5 # TOTAL: 1 6 # PASS: 0 7 # SKIP: 0 8 # XFAIL: 0 9 # FAIL: 1 10 # XPASS: 0 11 # ERROR: 0 12 13 .. contents:: :depth: 2 14 15 FAIL: testrunner 16 ================ 17 18 Running 93 test cases... 19 unknown location(0): ^[[4;31;49mfatal error: in "dnsdistlbpolicies/test_lua": LuaContext::ExecutionErrorException: bad light userdata pointer^[[0;39;49m 20 test-dnsdistlbpolicies_cc.cc(455): ^[[1;36;49mlast checkpoint: "test_lua" test entry^[[0;39;49m 21 unknown location(0): ^[[4;31;49mfatal error: in "dnsdistlbpolicies/test_lua_ffi_rr": LuaContext::ExecutionErrorException: bad light userdata pointer^[[0;39;49m 22 test-dnsdistlbpolicies_cc.cc(511): ^[[1;36;49mlast checkpoint: "test_lua_ffi_rr" test entry^[[0;39;49m 23 unknown location(0): ^[[4;31;49mfatal error: in "dnsdistlbpolicies/test_lua_ffi_hashed": LuaContext::ExecutionErrorException: bad light userdata pointer^[[0;39;49m 24 test-dnsdistlbpolicies_cc.cc(569): ^[[1;36;49mlast checkpoint: "test_lua_ffi_hashed" test entry^[[0;39;49m 25 unknown location(0): ^[[4;31;49mfatal error: in "dnsdistlbpolicies/test_lua_ffi_whashed": LuaContext::ExecutionErrorException: bad light userdata pointer^[[0;39;49m 26 test-dnsdistlbpolicies_cc.cc(626): ^[[1;36;49mlast checkpoint: "test_lua_ffi_whashed" test entry^[[0;39;49m 27 unknown location(0): ^[[4;31;49mfatal error: in "dnsdistlbpolicies/test_lua_ffi_chashed": LuaContext::ExecutionErrorException: bad light userdata pointer^[[0;39;49m 28 test-dnsdistlbpolicies_cc.cc(681): ^[[1;36;49mlast checkpoint: "test_lua_ffi_chashed" test entry^[[0;39;49m 29 30 ^[[1;31;49m*** 5 failures are detected in the test module "unit" 31 ^[[0;39;49mFAIL testrunner (exit status: 201) 32 }}} test-suite.log
That looks like the aarch64/luajit problem.
Short description
Currently there seems to be no distribution of dnsdist for the aarch64 platform.
Usecase
Running dnsdist on modern ARM aarch64 based hardware.
Description
Modern ARM based platforms are based on aarch64 so having the ability to use dnsdist (or any other powerdns program, really) on these platforms without having to spend a long time compiling or fiddling with cross-compiling would be ideal.