Closed kring closed 3 months ago
Welp.... the performance on Linux is excellent. The complete plugin build on Linux went from 28 minutes 19 seconds down to 20 minutes 8 seconds. So it cut off a third of the compile time, and I'm pretty confident that would improve further if the large variants in cesium-native moved to swl, too.
Unfortunately, it doesn't really work on any of our other platforms. 😆 I'm not sure how fixable that is. It's a bit strange because every platform except Windows uses Clang. Apple and Android use v14 while Linux uses v15, so maybe that's the problem.
The size of the Linux binaries in this branch are also significantly smaller. Total ZIP size is down from 188 MB to 167 MB. Uncompressed size goes down from 903 MB to 633 MB.
The complete plugin build on Linux went from 28 minutes 19 seconds down to 20 minutes 8 seconds. The size of the Linux binaries in this branch are also significantly smaller. Total ZIP size is down from 188 MB to 167 MB. Uncompressed size goes down from 903 MB to 633 MB.
Nice. Worth defining variant
differently per platform to work around the problems?
Worth defining variant differently per platform to work around the problems?
If we can get it working on macOS, then yes, I think it's worth it. MSVC's std::variant implementation seems to be less terrible (or perhaps the compiler is better at dealing with the terrible), so the compile times and binary bloat aren't as bad there. But if we can get gains on macOS like we did on Linux, it'll shave close to 40 minutes off the CI time for that platform.
I looked into it a little more, and the immediate problem seems to be bugs the Clang 14's C++20 implementation such that the swl::variant can't hold instances with non-trivial destructors. The only thing in a variant with a non-trivial destructor currently (AFAICT) is PropertyArrayView
, because it has the dual owning (std::vector) / non-owning (gsl::span) representation. std::vector of course has a non-trivial destructor. It might be possible to make the owning variant of PropertyArrayView
a separate type that is not stored in a swl::variant. Maybe. That's what I'm trying, anyway.
Possibly caused by this: https://stackoverflow.com/a/70997041/85306 and also you might be able to work around it with a define of SWL_VARIANT_NO_CONSTEXPR_EMPLACE
With some hackery I got it building on Android and macOS.
The plugin build for Android / UE 5.2 went from 24 min 58 s down to 11 min 3 s. And the zipped package went from 168 MB down to 108 MB.
The plugin build for macOS / UE 5.2 went from 1 hour 6 min 57 s down to 17 min 22s! I think on macOS we were previously hitting swap pretty hard, that's why the improvement is so much more drastic. Also the zipped macOS package went from 213 MB down to 148 MB.
My hacks are too dodgy to ship, but I think this proves that spending some more time on this will pay off big time in terms of reducing CI time and therefore improving our productivity.
Looks like it makes a huge difference on Windows, too. It goes from 39 min 17 s down to 12 min 23 s.
I'm really happy with this and am going to merge this into the vcpkg. Here are the overall numbers:
Build | Main Time | This Branch Time | Speedup | Main Size (MB) | This Branch Size | Size Change |
---|---|---|---|---|---|---|
Windows 5.2 | 39m 34s | 13m 6s | 202% | 194 | 284 | +46% |
Windows 5.3 | 44m 14s | 14m 33s | 204% | 191 | 291 | +52% |
Windows 5.4 | 50m 11s | 18m 36s | 170% | 205 | 304 | +48% |
Linux 5.2 | 26m 10s | 17m 39s | 48% | 182 | 120 | -34% |
Linux 5.3 | 27m 24s | 18m 15s | 50% | 181 | 120 | -34% |
Linux 5.4 | 33m 9s | 20m 53s | 59% | 181 | 122 | -34% |
Android 5.2 | 23m 28s | 12m 33s | 87% | 127 | 111 | -13% |
Android 5.3 | 22m 31s | 11m 38s | 94% | 127 | 112 | -12% |
Android 5.4 | 44m 11s | 23m 33s | 88% | 235 | 170 | -28% |
macOS 5.2 | 56m 41s | N/A | N/A | 204 | N/A | N /A |
macOS 5.3 | 1h 32m 42s | N/A | N/A | 397 | N/A | N /A |
macOS 5.4 | 1h 43m 55s | N/A | N/A | 413 | N/A | N /A |
iOS 5.2 | 37m 55s | N/A | N/A | 185 | N/A | N /A |
iOS 5.3 | 38m 6s | N/A | N/A | 191 | N/A | N /A |
iOS 5.4 | 48m 9s | N/A | N/A | 199 | N/A | N /A |
macOS+iOS 5.2 | N/A | 43m 33s | 117% | N/A | 383 | -2% |
macOS+iOS 5.3 | N/A | 51m 6s | 156% | N/A | 387 | -34% |
macOS+iOS 5.4 | N/A | 58m 36s | 159% | N/A | 430 | -30% |
The numbers for this branch are from this build: https://github.com/CesiumGS/cesium-unreal/actions/runs/9637521085
The numbers for main are from this build: https://github.com/CesiumGS/cesium-unreal/actions/runs/9599979122
In all cases, the times are to build the plugin only. Sizes are the size of the distribution ZIP produced by CI for that platform.
Two reasons:
So while this branch is an improvement even for Windows when compared against vcpkg without the improvements in this branch, the binaries are still larger than they used to be overall.
In main, we have separate CI jobs for macOS and iOS. In this branch, they've been combined together. Combining them is a pretty big win for us because GitHub limits us to 5 simulteneous Apple builds. If you use 6 (three UE versions times macOS + iOS), then they are necessarily serialized. And when they're serialized, we end up paying the substantial "install UE" cost twice for zero benefit.
The percentage improvement columns for the macOS+iOS rows have been computed based on the sum of the separate plugin builds in main. These improvements do not count the speedup we get from eliminating the extra UE install (so the actually improvement is higher).
It includes both ARM64 and x86-64 binaries for the plugin, because Unreal changed a default. This is true both in main and in this branch. However, it currently does not include cesium-native binaries, so attempting to package for Android/x86-64 will fail.
By default, UE 5.2 builds only for the host's processor on macOS, which in our case was x86-64. In 5.3 and 5.4, Epic changed the default to build for both platforms regardless of host architecture. I've changed this branch to explicitly build for both x86-64 and ARM64 even in UE 5.2, so it's actually running faster and producing smaller binaries than main while also supporting an additional processor architecture that it wasn't before.
In my limited tests this drastically improves compile times.
So far cesium-native is still using C++17 and std::variant. I've only moved cesium-unreal to C++20 (it was already there for UE5.4, but now 5.2 and 5.3 build with C++20 too). And because swl::variant requires C++20, I've only moved the variants in the Unreal-specific code to swl::variant. But even with that limited change, Linux compile times on my system were cut by 35%.
This is a PR into #1447 which I'll probably merge myself if CI is happy on all platforms.