JuliaCI / julia-buildbot

Buildbot configuration for build.julialang.org
MIT License
18 stars 14 forks source link

ARM builders #58

Closed yuyichao closed 5 years ago

yuyichao commented 7 years ago

As @tkelman requested, here's a list of implemented/planned/proposed changes to how arm builders works.

  1. I've finally got one of my AArch64 box to be stable enough for remote access and hosting automated jobs. I've created 2 LXC on the machine with Centos 7 (aarch64) and Debian 7 (armhf). Unless someone else (julia computing) acquires some other AArch64 servers, my plan was to use these as slaves for the buildbot since the server utilization is otherwise pretty low.
  2. The two distros mentioned above are the ones with the oldest glibc I can possibly find. The Centos 7 has glibc 2.17, which I think is the first or the second version that supports aarch64. The Debian 7 has glibc 2.13, which isn't the first one that supports eabi (and possibly not the first one supporting eabihf) but is pretty close and I can't find anything older. (The oldest version of centos, fedora, ubuntu, opensuse that supports armhf all comes with a newer glibc).
  3. The major advantage of the new server is performance. It should be 5-10x faster than the the old one and with more cores, memory and disk space.
  4. IIUC the original plan was to integrate them in the newer version of buildbot (and I believe this is still the plan for the aarch64 one). However, the old arm buildbot somehow breaks recently and the gcc 6.2 doesn't compile there so the arm builder is already switched to the new server, which is clean and doesn't have the issue.
  5. Before we can have a working nightly out from the new builder using gcc 6.2, the main blocking issue is https://github.com/JuliaLang/julia/pull/18996, which I'll merge soon unless someone has a better solution. (I don't like it myself but can't find a better way...)
  6. Since new builder is much faster and the unwinding issue is fixed, we can start running tests on arm. (We should also run aarch64 tests once the builder is online and all tests should pass on llvm 3.9). The known segfaulting tests there are disabled by https://github.com/JuliaLang/julia/pull/19003 which I'll also merge soon.
  7. The only disadvantage of the new builder that I can think of is that the uname -m is armv8l instead of armv6l or armv7l. This is only an issue when compiling any dependencies (noticeably clang and gcc) that picks up the target arch automatically from uname -m. Both of them should be overwritable. We can possibly figure this out from ARCH but I'm not sure what's the most robust way to do it.... In order to make sure the binary is compiled for the right arch, I think we can keep the old builder and run a few simple tests on it using the binary we compiled to make sure that the binary is good to use. From my brief testing just now it seems that the LLVM was probably not compiled with LTO and it is somehow using neon instructions.
  8. We need to build for armv6 if we want to support rpi0 and rpi1. Do we want to add builder for this? (Also in any case the download link for the current binaries should say armv7 instead).

@staticfloat, @ViralBShah

simonbyrne commented 7 years ago

+1 to armv6, if possible

yuyichao commented 7 years ago

Current blocking issues for nightly update.

The slave is currently offline for me to debug the issue last night and it already has a correctly compiled LLVM cached (the one I compiled manually) so as long as the command above is implemented in the buildbot setup script we can merge https://github.com/JuliaLang/julia/pull/18996 and ready to update the nightly.

I just breifly tried setting MARCH=armv6 locally and it's complaining because it picks up the -march flag from my llvm compilation........ I'll try to compile a LLVM and test it later. It would also be nice if we can have a armv6 box to test if the compiled binary works.

yuyichao commented 7 years ago

Update:

Remaining TODO's (Assuming the new nightly is fine.....):

ARMv6 support should probably move to it's own issue.

tkelman commented 7 years ago

regarding adding tests, in the current arrangement if tests fail on any of the test builders then no binaries get built at all. we can't use that setup with arm, we're not blocking x86 binaries on arm bugs. so running tests should also wait until the buildbot setup is refactored.

yuyichao commented 7 years ago

The need for gcc trapper of ar, ranlib, nm is fixed. I did recompile binutils and gcc but I think what actually fixes it is to symlink the lto plugin into the binutils plugin directory (Ref https://github.com/archlinuxarm/PKGBUILDs/blob/527636ea075312c64aff6dad8ddc2b33933c8d62/core/gcc/PKGBUILD#L180-L183). The test build passed on the buildbot so nuking the buildbot should be fine now.

It'll still be nice to set the correct host triple, (which might actually be a regression in LLVM's cmake system).

tkelman commented 7 years ago

Cool, good catch. Is the ARM builder being managed by the same set of ansible scripts, should we record the need to do that somewhere so it's easy to re-provision if we ever need to?

yuyichao commented 7 years ago

@staticfloat for that. I think there's automatic script to do this and maybe he sent me a link to the repo (maybe https://github.com/staticfloat/julia-ansible-scripts ?).

For the record the command line I used are

I also needs to clear my CFLAGS since it triggers some Werror in gcc's dependencies that can't be turned off easily.

yuyichao commented 7 years ago

And with the above flags gcc installs the lto plugins to libexec by default and ln -s ~/local/libexec/gcc/armv7l-unknown-linux-gnueabihf/6.2.0/liblto_plugin.so ~/local/lib/bfd-plugins puts it to where binutils expects.

staticfloat commented 7 years ago

Confirmed that GCC 7 eliminates the need for us to do any LTO or AR/RANLIB overriding

staticfloat commented 5 years ago

Closed by #111