digego / extempore

A cyber-physical programming environment
1.41k stars 127 forks source link

Build and cache LLVM 3.8 separately for faster feedback. #400

Closed nic-donaldson closed 3 years ago

nic-donaldson commented 3 years ago

It takes a long time to build LLVM. I've been building LLVM 11.0 as a separate step in my llvm-refactor branch to get faster feedback from CI. I've mostly copied that process over for LLVM 3.8.

It makes the workflow file longer and harder to read, but I think the time save is worth it.

Let me know what you think! I've left the commit history untouched, but I'm happy to rebase/squash it to something more reasonable if you like.

benswift commented 3 years ago

Hey @nic-donaldson thanks, that's really helpful and I'm keen to see this land soon.

A few questions/comments:

The yml file is now longer, but I think that you've broken it down into steps pretty well, so the length isn't a problem. One small request: can we make the naming of the platform-specific steps consistent, e.g.:

name: download and unpack llvm (macos/linux)
# steps go here...
name: download and unpack llvm (windows)
# steps go here...

in addition to just being more consistent, it's probably best not to mention nix (since some folks do build Extempore on nix and we don't want to confuse them; better to be clearer that it's macos/linux)

In specifying the CMake build options on line 64, some (all?) of those values you have on that line should be defauts... is that not the case? It's fine if you do need to specify them here (even if it's just to be explicit) but I just wanted to check.

Is the -j2 stuff specific to the GitHub runners (since they're 2vCPU atm)? Again, fine if it is---and we should optimise for the platform these scripts will run on. I've actually tried to use CMake's --parallel flag in GH actions in the past and have had it give me weird segfaults/errors. But if -j2 works reliably (and gives a ~2x speedup!) then I'm super keen :)

Finally, thanks for rebasing - and I reckon if you could squash them all into one big commit before we push it to master that'd be good. Our "don't break master" hygiene is pretty bad in general, but if we can avoid pushing commits which are known to fail the tests then that's still a small win. If you'd like me to do the squash that's fine as well.

benswift commented 3 years ago

btw, congrats on issue #400 :tada: :tada: :tada:

nic-donaldson commented 3 years ago

can we make the naming of the platform-specific steps consistent

yep, done

since some folks do build Extempore on nix

I'm also using NixOS! But I've only figured out how to build extempore on NixOS, not package it for NixOS.

some (all?) of those values you have on that line should be defauts

Honestly I don't know. I think I have a mix of settings from my branch and from CMakeLists.txt for 3.8 here. On my branch my goal was to minimize whatever needed to be built, which is why llvm-as is built explicitly.

Anyway I went through them all and I don't think any were defaults. Although -DLLVM_INCLUDE_TESTS=OFF -DLLVM_INCLUDE_GO_TESTS=OFF are probably unnecessary, because if we look at LLVM 3.8's CMakeLists.txt, we see they disable targets that aren't being built anyway:

option(LLVM_BUILD_TESTS
  "Build LLVM unit tests. If OFF, just generate build targets." OFF)
option(LLVM_INCLUDE_TESTS "Generate build targets for the LLVM unit tests." ON)
option(LLVM_INCLUDE_GO_TESTS "Include the Go bindings tests in test build targets." ON)

Re -j2, I think I added it because -j1 was timing out when building LLVM11. I tried -j$(nproc), but nproc wasn't installed on the macos runner. Common wisdom is use nproc+1, but LLVM builds tend to eat a lot of RAM and start thrashing or get OOM killed when you increase the job parallelism, so 2 seems reasonable, and has been reliable as far as I can tell.

If you'd like me to do the squash that's fine as well.

Sure! If it all looks good I'll leave it with you.

benswift commented 3 years ago

I'm also using NixOS! But I've only figured out how to build extempore on NixOS, not package it for NixOS.

Yeah. I don't use it myself (although it does look cool, and I'm open to trying it out in the future).

some (all?) of those values you have on that line should be defauts

Honestly I don't know. I think I have a mix of settings from my branch and from CMakeLists.txt for 3.8 here. On my branch my goal was to minimize whatever needed to be built, which is why llvm-as is built explicitly.

Yep, good plan.

Anyway I went through them all and I don't think any were defaults. Although -DLLVM_INCLUDE_TESTS=OFF -DLLVM_INCLUDE_GO_TESTS=OFF are probably unnecessary, because if we look at LLVM 3.8's CMakeLists.txt, we see they disable targets that aren't being built anyway:

👍🏻

Re -j2, I think I added it because -j1 was timing out when building LLVM11. I tried -j$(nproc), but nproc wasn't installed on the macos runner. Common wisdom is use nproc+1, but LLVM builds tend to eat a lot of RAM and start thrashing or get OOM killed when you increase the job parallelism, so 2 seems reasonable, and has been reliable as far as I can tell.

Yeah, it's a memory hog for sure - and the GA runners aren't super beefy. Again, I'm happy with these scripts not being necessarily the "general purpose" case; they can be tailored to get the CI jobs to finish as fast as possible.

If you'd like me to do the squash that's fine as well.

Sure! If it all looks good I'll leave it with you.

Ok, I'll squash it and push it up asap.

benswift commented 3 years ago

hey @nic-donaldson just an update - I (squash) merged it this morning, and it failed on windows-2016 only - there seemed to be some problem with doing the parallelised AOT build steps (although I didn't look super closely). Then, I re-ran the job (i.e. with the exact same commit) and it worked.

Some observations:

Anyway, as I said I don't think this is caused by your patch. But I just wanted to let you know in case you'd seen the "red cross" when it first landed and worried what was going on :)

nic-donaldson commented 3 years ago

there seemed to be some problem with doing the parallelised AOT build steps

oh no :( nothing worse than flaky builds. Might be best to drop the Windows aot-compile-stdlib step to -j1 then.

benswift commented 3 years ago

yeah, good call - I've just pushed a change and we'll see if it helps (I mean, we'll only know if it hasn't fixed it - not if it has. because indeterminism.)