fbridault commented 7 years ago

At last, reduced drastically (from 40% to 55%) the compilation time.

the big gain comes by precompiled headers usage. Most of them have been customized but we can probably do even better for some of them.
a second gain comes from a trick to compile camp macros only when needed.
a minor gain comes from some cleaning in headers includes.

There is a CMake option to switch off the pch support, which is on by default. Everything is reported to work well on Linux (gcc/clang) and Windows. MacOsX support might work since clang is handled. If necessary I can disable it by default on MacOSX if it does not work. It would be sad to not benefit from this huge gain on the other platforms.

fbridault commented 7 years ago

Commits have been rebased and reordered/mergered. Please discard your local checkout if you tried it before.

fbridault commented 7 years ago

Thanks for your tests and remarks @emiliehar I integrated your changes. :+1:

fbridault commented 7 years ago

I made several improvements in the last commits.

I fixed a bug with pch dependencies with Clang and Gcc. If a file included in a pch was modified, then the pch was not automatically recompiled by Ninja or Makefile. It could have been fixed easily with the IMPLICIT_DEPENDS option of add_custom_command but this is not implemented for the Ninja generator. Thus I chose a different approach, using the DEPFILE option of add_custom_command. The dependency file is generated in a custom command before compiling the pch.

In the last commits, I improved the overall pch usage. I implemented pch sharing, so that it is now possible to use a pch in a bundle or a library from an another target by setting USE_PCH_FROM_TARGET in the Properties.cmake. There are also two new dummy targets, pchCore and pchServices, that allow to compile and use a pch with a good base for core libraries or a component using services. pchCore is used by default for libraries and pchServices for bundles, if no custom pch.hpp is found and USE_PCH_FROM_TARGET is not set. This sharing strategy allowed to gain around 15% more. :) And the nice thing is that even if you don't customize your pch, you already get a good boost with the default pch. I also took care to manage dependencies correctly, so if you specify a pch target that was not already a dependency or a requirement, then the build script automatically adds this target as a requirement.

Gcc support forced me to add a constraint that was not present before. It is not allowed now to include files from the current target in a pch. This is to simplify pch sharing: when using a pch from an another target, the cpp file must include the same definitions than the pch. So I removed target_EXPORTS (and target_VER but this is less important) definitions from the pch build. This prevents to export symbols from the pch included file. Practically this is not such a big deal, if you would like to include a header from the current target, just try to copy the bigger included files outside of the target from this header. I did this for the pch where it occurred and I noticed no performance penalty.

Nevertheless I have to test all these improvements on Windows and then, I think we are good for the merge.

fbridault commented 7 years ago

Okay I think I'm ready for the last review round. I fixed the windows build, it was harder than expected, especially because of the need to link with the original pch.obj when sharing pch. I solved that issue thanks to CMake "Object" library. I removed most custom pch because the benchmarks prove they don't improve performance and depending on the platform, on windows for instance, building a pch is a bottleneck since Ninja breaks the parallel build at this stage. It so better to keep the number of pch quite low. This sharing strategy is also very efficient on Windows. On my windows laptop, I started with 23 minutes to build VRRender. With the first pch implementation without sharing, I managed to build in 11 minutes. With pch sharing, I managed to build even faster in 6 minutes.

Please test and review the code, I think this would be a very nice addition for 11.0.4.

schweitzer commented 7 years ago

Ok it works well for me!

Here's the details of my tests:

Linux

OS: Linux Mint 18.1
Processor: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz (12 cores)
RAM: 16go
Compiler : GCC 5.4
CMake: 3.7.2
Build Tool: ninja (with -j10 flag)
Build Type : Debug (with Tests)

Time to build VRRender from scratch:

Before PCH: real: 11m16.126s user: 104m8.964s sys: 6m16.620s After PCH: real: 5m55.440s user: 51m20.940s sys: 3m46.300s

It is 2x faster using PCH!

Windows

OS: Windows 8.1
Processor: Intel(R) Core(TM) i7 CPU M620 @2.67GHZ (4 cores)
RAM: 6go
Compiler : msvs 2013
CMake: 3.7.0
Build Tool: ninja (with -j4 flag)
Build Type : Debug (with Tests)

Time to build VRRender from scratch:

Before PCH: 68min After PCH: 27min

It is at least 2x faster with PCH!

Great work !

dweckmann commented 7 years ago

On my machine:

macOS SIERRA 10.12.3 (16D32) Mac Pro (mi-2010) 2 x 2,4 GHz Quad-Core Intel Xeon 24 Go 1066 MHz DDR3

The result are also impressive:

Without PCH: real 15m29.658s user 221m4.906s sys 11m24.929s

Witch PCH: real 8m27.573s user 114m31.416s sys 7m25.137s

greenjava commented 7 years ago

Configuration

OS: Windows 10 Version 1607
Processor: Intel(R) Core(TM) i7 CPU 5930K @3.50GHZ (12 cores)
RAM: 16go
Compiler: MSVC 2015 Update 3
CMake: 3.7.0
Build Tool: ninja
Build type: Debug (without test)

Time to build VRRender from scratch:

Before PCH: 12min14.751s After PCH: 6min5.621s

👍

fbridault commented 7 years ago

Okay, I truly hope this was the very last commit for this pull request.

I realized that the SPYLOG_LEVEL_<target> override was broken. It was not possible to define a different level than the one defined in the pch target.

I fixed this on the three different platforms, it was really a pain in the ass... There is a little difference than before, log macros defined in hpp will use the log level from the pch and not the target. Actually I think it was a bug before. When we defined the log level TRACE in ioVTK for instance, the traces from log macros of fwCom headers were displayed. So now you will only get the traces from ioVTK, when using pch. Without the pch the behaviour remains the same.

fw4spl-org / fw4spl

Reduced compilation time #2

Linux

Time to build VRRender from scratch:

Windows

Time to build VRRender from scratch:

Configuration

Time to build VRRender from scratch: