Use C++20 and swl::variant instead of std::variant.

kring commented 3 months ago

In my limited tests this drastically improves compile times.

So far cesium-native is still using C++17 and std::variant. I've only moved cesium-unreal to C++20 (it was already there for UE5.4, but now 5.2 and 5.3 build with C++20 too). And because swl::variant requires C++20, I've only moved the variants in the Unreal-specific code to swl::variant. But even with that limited change, Linux compile times on my system were cut by 35%.

This is a PR into #1447 which I'll probably merge myself if CI is happy on all platforms.

kring commented 3 months ago

Welp.... the performance on Linux is excellent. The complete plugin build on Linux went from 28 minutes 19 seconds down to 20 minutes 8 seconds. So it cut off a third of the compile time, and I'm pretty confident that would improve further if the large variants in cesium-native moved to swl, too.

Unfortunately, it doesn't really work on any of our other platforms. 😆 I'm not sure how fixable that is. It's a bit strange because every platform except Windows uses Clang. Apple and Android use v14 while Linux uses v15, so maybe that's the problem.

kring commented 3 months ago

The size of the Linux binaries in this branch are also significantly smaller. Total ZIP size is down from 188 MB to 167 MB. Uncompressed size goes down from 903 MB to 633 MB.

csciguy8 commented 3 months ago

The complete plugin build on Linux went from 28 minutes 19 seconds down to 20 minutes 8 seconds. The size of the Linux binaries in this branch are also significantly smaller. Total ZIP size is down from 188 MB to 167 MB. Uncompressed size goes down from 903 MB to 633 MB.

Nice. Worth defining variant differently per platform to work around the problems?

kring commented 3 months ago

Worth defining variant differently per platform to work around the problems?

If we can get it working on macOS, then yes, I think it's worth it. MSVC's std::variant implementation seems to be less terrible (or perhaps the compiler is better at dealing with the terrible), so the compile times and binary bloat aren't as bad there. But if we can get gains on macOS like we did on Linux, it'll shave close to 40 minutes off the CI time for that platform.

I looked into it a little more, and the immediate problem seems to be bugs the Clang 14's C++20 implementation such that the swl::variant can't hold instances with non-trivial destructors. The only thing in a variant with a non-trivial destructor currently (AFAICT) is PropertyArrayView, because it has the dual owning (std::vector) / non-owning (gsl::span) representation. std::vector of course has a non-trivial destructor. It might be possible to make the owning variant of PropertyArrayView a separate type that is not stored in a swl::variant. Maybe. That's what I'm trying, anyway.

jherico commented 3 months ago

Possibly caused by this: https://stackoverflow.com/a/70997041/85306 and also you might be able to work around it with a define of SWL_VARIANT_NO_CONSTEXPR_EMPLACE

kring commented 3 months ago

With some hackery I got it building on Android and macOS.

The plugin build for Android / UE 5.2 went from 24 min 58 s down to 11 min 3 s. And the zipped package went from 168 MB down to 108 MB.

The plugin build for macOS / UE 5.2 went from 1 hour 6 min 57 s down to 17 min 22s! I think on macOS we were previously hitting swap pretty hard, that's why the improvement is so much more drastic. Also the zipped macOS package went from 213 MB down to 148 MB.

My hacks are too dodgy to ship, but I think this proves that spending some more time on this will pay off big time in terms of reducing CI time and therefore improving our productivity.

kring commented 3 months ago

Looks like it makes a huge difference on Windows, too. It goes from 39 min 17 s down to 12 min 23 s.

kring commented 3 months ago

I'm really happy with this and am going to merge this into the vcpkg. Here are the overall numbers:

Build	Main Time	This Branch Time	Speedup	Main Size (MB)	This Branch Size	Size Change
Windows 5.2	39m 34s	13m 6s	202%	194	284	+46%
Windows 5.3	44m 14s	14m 33s	204%	191	291	+52%
Windows 5.4	50m 11s	18m 36s	170%	205	304	+48%
Linux 5.2	26m 10s	17m 39s	48%	182	120	-34%
Linux 5.3	27m 24s	18m 15s	50%	181	120	-34%
Linux 5.4	33m 9s	20m 53s	59%	181	122	-34%
Android 5.2	23m 28s	12m 33s	87%	127	111	-13%
Android 5.3	22m 31s	11m 38s	94%	127	112	-12%
Android 5.4	44m 11s	23m 33s	88%	235	170	-28%
macOS 5.2	56m 41s	N/A	N/A	204	N/A	N /A
macOS 5.3	1h 32m 42s	N/A	N/A	397	N/A	N /A
macOS 5.4	1h 43m 55s	N/A	N/A	413	N/A	N /A
iOS 5.2	37m 55s	N/A	N/A	185	N/A	N /A
iOS 5.3	38m 6s	N/A	N/A	191	N/A	N /A
iOS 5.4	48m 9s	N/A	N/A	199	N/A	N /A
macOS+iOS 5.2	N/A	43m 33s	117%	N/A	383	-2%
macOS+iOS 5.3	N/A	51m 6s	156%	N/A	387	-34%
macOS+iOS 5.4	N/A	58m 36s	159%	N/A	430	-30%

What exactly is this table comparing?

The numbers for this branch are from this build: https://github.com/CesiumGS/cesium-unreal/actions/runs/9637521085

The numbers for main are from this build: https://github.com/CesiumGS/cesium-unreal/actions/runs/9599979122

In all cases, the times are to build the plugin only. Sizes are the size of the distribution ZIP produced by CI for that platform.

Why did the Windows distributions get bigger in this branch?

Two reasons:

This branch is using vcpkg, which means we're including more libraries for cesium-native and its dependencies than we were in main. For example, we include all of the Abseil libraires instead of just the tiny bit of it that s2geometry actually needs.
Binary size doesn't seem to explode with our previous variant implementation, the way it does with Clang.

So while this branch is an improvement even for Windows when compared against vcpkg without the improvements in this branch, the binaries are still larger than they used to be overall.

What's going on with the macOS and iOS builds?

In main, we have separate CI jobs for macOS and iOS. In this branch, they've been combined together. Combining them is a pretty big win for us because GitHub limits us to 5 simulteneous Apple builds. If you use 6 (three UE versions times macOS + iOS), then they are necessarily serialized. And when they're serialized, we end up paying the substantial "install UE" cost twice for zero benefit.

The percentage improvement columns for the macOS+iOS rows have been computed based on the sum of the separate plugin builds in main. These improvements do not count the speedup we get from eliminating the extra UE install (so the actually improvement is higher).

Why is the Android 5.4 build bigger than the other two?

It includes both ARM64 and x86-64 binaries for the plugin, because Unreal changed a default. This is true both in main and in this branch. However, it currently does not include cesium-native binaries, so attempting to package for Android/x86-64 will fail.

Why are the improvements for UE 5.2 on macOS smaller than for the other two UE versions?

By default, UE 5.2 builds only for the host's processor on macOS, which in our case was x86-64. In 5.3 and 5.4, Epic changed the default to build for both platforms regardless of host architecture. I've changed this branch to explicitly build for both x86-64 and ARM64 even in UE 5.2, so it's actually running faster and producing smaller binaries than main while also supporting an additional processor architecture that it wasn't before.

CesiumGS / cesium-unreal