[Discussion] Speed up compilation reducing number of includes

KratosMultiphysics / Kratos

Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.

https://kratosmultiphysics.github.io/Kratos/

Other

1.04k stars 245 forks source link

[Discussion] Speed up compilation reducing number of includes #10062

Open loumalouomega opened 2 years ago

loumalouomega commented 2 years ago

Description @KratosMultiphysics/technical-committee

One thing that makes the compilation slower is the use of many includes (in a C/C++ compiler an #include copy and pastes the whole file in the source code, so it may become huge even for a small quantity of code).

To avoid these problems usually you do the following two things:

Move as much code as possible to a source file (.cpp/.cxx)
Use precompiled headers (https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html and https://cmake.org/cmake/help/latest/command/target_precompile_headers.html).

These two options must go first, the first one a priori is taken into account (but I still find from time to time files not seaprected in source files, see https://github.com/KratosMultiphysics/Kratos/pull/10050). The second I think we should consider to take into consideration, right now we are not, and I think we can do it specially with the templated headers that we use intensively (like geometries).

In addition to these, there are tools that help you to reduce the number of includes considered (many times we copy paste code and we include not even used). For example there are tools like https://include-what-you-use.org/.

I know that Unitary builds partially solves the issue, but we should be able to compile relatively fast without doing tricks like these. At the end this should be the proper thing to do, and in any case doing this the Unitary builds will be even faster.

loumalouomega commented 2 years ago

Ping @roigcarlo

roigcarlo commented 2 years ago

HI @loumalouomega

We acknowledge this is a problem but still we are unsure how yo tackle it. All the solutions you propose are effective towards reducing the compilation time, but not all of them can be applied freely in Kratos right now.

Moving to cpp:

This ( in my opinion ) is where we can get more improvement, but we cannot blindly start to move all classes to cpp because we have some inter-application dependencies ( specially at template level ) where some compilation units would require to have definition of function instantiated for classes that do not exist in the moment of compilation. Example:

Compilation unit 1

// Header
class MyClass<class TParam> {
    TParam myTemplatedFunction();
}

// Source
template <T> T MyClass::myTemplatedFunction() { return T() }

// Source

Compilation unit 2 ( that has a linking dependency on CU1 but does not have access to "MyClass" cpp )

// Header ( or source, problem is the same )
#include "MyClass.h"
struct Break {};

class MySecondClass {
    Break ErrorFunction() { return MyClass<Break>().myTemplatedFunction()}
}

In this scenario is impossible to use MyClass outside its compilation unit, and its compilation unit is not enough to fully define without including cyclic dependencies.

In general in Kratos this is the case of some Element instances. In conclusion we have to be very careful with this kind of structures so it does not break anything (or worst, makes it unusable if at some point someone needs that)

Precompiled headers:

I recall trying this the first time we used unitary builds (with cotire) and the effect was minimal, and we always ended up having to recompile the PCH's every time a change was made, probably because a big chunk of our code is in headers. If we manage to correctly port it to cpp, no problems against it.

Reduce the includes:

Thumbs up for this, as long as you manage to identify the redundancies and resolve the dependencies of files that incorrectly include a "hub" include because they don't know what they need.

loumalouomega commented 2 years ago

HI @loumalouomega

We acknowledge this is a problem but still we are unsure how yo tackle it. All the solutions you propose are effective towards reducing the compilation time, but not all of them can be applied freely in Kratos right now.

Moving to cpp:

This ( in my opinion ) is where we can get more improvement, but we cannot blindly start to move all classes to cpp because we have some inter-application dependencies ( specially at template level ) where some compilation units would require to have definition of function instantiated for classes that do not exist in the moment of compilation. Example:

Compilation unit 1
// Header
class MyClass<class TParam> {
    TParam myTemplatedFunction();
}

// Source
template <T> T MyClass::myTemplatedFunction() { return T() }

// Source
Compilation unit 2 ( that has a linking dependency on CU1 but does not have access to "MyClass" cpp )
// Header ( or source, problem is the same )
#include "MyClass.h"
struct Break {};

class MySecondClass {
    Break ErrorFunction() { return MyClass<Break>().myTemplatedFunction()}
}
In this scenario is impossible to use MyClass outside its compilation unit, and its compilation unit is not enough to fully define without including cyclic dependencies.

In general in Kratos this is the case of some Element instances. In conclusion we have to be very careful with this kind of structures so it does not break anything (or worst, makes it unusable if at some point someone needs that)

Yes I know that some templated classes like geometries cannot be moved to cpp, because the idea is to be ablke to use it with any arbitrary type of point.

matekelemen commented 2 years ago

Moving to cpp:

We could do something similar to forward declarations. It may be a bit overly convoluted, but we could organize templates like this:

A header containing the template declaration (no implementation)
A separate header containing the implementations

=> by default, we'd only include the declarations. => when we need to instantiate the template with a specific type, we create a source file that includes both the declaration and the implementation, then define an explicit specialization for the desired type. This would make the template available for explicitly specialized types in all translation units (linked ones too).

Here's your example rewritten:

my_class.h

template <class TParam>
class MyClass {
    TParam myTemplatedFunction();
};

my_class_impl.h

template <class TParam>
TParam MyClass<TParam>::myTemplatedFunction() {
    // implementation goes here
}

Now let's say we want to use MyClass<int> somewhere: some_application.h

#include "my_class.h"

void doSomethingWithMyInt(MyClass<int> intInstance);

some_application.cpp

void doSomethingWithMyInt(MyClass<int> intInstance) {
    intInstance.myTemplatedFunction();
}

At this point, using some_application.cpp would result in a linker error because the compiler can't find the definition of MyClass<int>::myTemplatedFunction since my_class_impl.hpp is not included. However, we don't want to include my_class_impl.hpp because it's too bulky, so we create a separate source file instead where we declare an explicit specialization with int. my_class_int.cpp

#include "my_class.h"
#include "my_class_impl.h"
template <> MyClass<int>;

This will force the compiler to generate the implementation for MyClass<int>, and it will become available in other translation units as well. We'll still get linker errors if we try using MyClass with other template parameters, but we could just do the same thing for every type we want to use MyClass with.

See the implementation in #10052 for another example. The only difference there is that the implementation is in the source file instead of a separate implementation header, because I knew all the types these function templates will be used with in advance.

This trick comes in very handy if there's only a handful of types we want to use our templates with. It's obviously not useful for truly generic templates.

RiccardoRossi commented 2 years ago

Dear @loumalouomega ,

we were discussing this (very important) topic within the @KratosMultiphysics/technical-committee

Aside for the concern expressed by @roigcarlo, moving stuff away from the headers may have important performance implications as it is "easier" to inline stuff in header than stuff in cpps.

The c++ forward looking solution to this would be "modules" (yet not widely supported)

see e.g.

https://docs.microsoft.com/en-us/cpp/cpp/modules-cpp?view=msvc-170

our impression is that it is NOT wise to start a wide movement of migrating stuff to cpps but rather to

take a deep look into the kratos include graph (most probably we could clean up many inclusions)
start looking into modules to see if that would fulfill our needs (forward looking, it cannot be now). Note that we CAN NOT do it now due to legacy compiler support, but we can start looking into this to have it clear as a future possible change.
Also splitting in cpp and header implies essentially losing the repo history, something we don't really love

for these reasons we are hesitant about changes in foundational classes (for example the data bases). Rather, we would look into changing classes that are not foundational, or even better, on cleaning up the header dependencies.

matekelemen commented 2 years ago

@RiccardoRossi Inlining is important, but I see a lot of implementations that don't really justify it. There are lots of class definitions that are fully implemented in headers even though they only have one or two inline-worthy functions (just take a look at almost any process. For example, I guarantee that the constructors of ApplyConstantVectorValueProcess will never be inlined by the compiler because they're just "too long").

My hunch is that people sometimes don't want to create a separate source file because of all the boilerplate involved, so they just define everything in headers.

loumalouomega commented 2 years ago

@RiccardoRossi Inlining is important, but I see a lot of implementations that don't really justify it. There are lots of class definitions that are fully implemented in headers even though they only have one or two inline-worthy functions (just take a look at almost any process. For example, I guarantee that the constructors of ApplyConstantVectorValueProcess will never be inlined by the compiler because they're just too "long").

My hunch is that people sometimes don't want to create a separate source file because of all the boilerplate involved, so they just define everything in headers.

That's my impression too

RiccardoRossi commented 2 years ago

@matekelemen it may be true that boilerplate has a role, nevertheless there are open PRs about chaning foundational (and performance sensitive) parts of the code...we are NOT positive about those...

if you really want to start a push in this direction, we suggest to start from other, less foundational, parts of the code, or by looking into the included includes

loumalouomega commented 2 years ago

3. Also splitting in cpp and header implies essentially losing the repo history, something we don't really love

I understand what you say, but precisely these classes are so important that are heavily included and the penalty is not insignificant. I think we must weight what do we prefer, and in certain cases the winner will be the compilation time when this is significative.

The other two points of course, but the PR you are referring is not possible to refactor inclussions, because is not very much included, but is included in the node.h, being indirectly included in many places.

One different approach would be to remove the template from node.h as I mentioned in other issue and instantiate the node, so it will be precompiled, and the weight of that header will be not so problematic.

loumalouomega commented 2 years ago

@matekelemen it may be true that boilerplate has a role, nevertheless there are open PRs about chaning foundational (and performance sensitive) parts of the code...we are NOT positive about those...

if you really want to start a push in this direction, we suggest to start from other, less foundational, parts of the code, or by looking into the included includes

See my former comment

matekelemen commented 2 years ago

Also splitting in cpp and header implies essentially losing the repo history.

Yeah that's a tricky and unfortunate problem. That said, the Kratos repo and its history is getting rather bulky (it's in the multi-gigabyte range now), so we'll have to start thinking about addressing that problem too, sooner or later.

We should probably sync the solution of these two issues.

roigcarlo commented 2 years ago

my_class_impl.h

template <class TParam>
TParam MyClass<TParam>::myTemplatedFunction() {
    // implementation goes here
}

@matekelemen This is a very interesting and complicated case: By having a include that effectively acts a source, you are forcing different transaction units to have the same definition of the same class. This is incorrect. If we want to force this ( and we use this in Kratos, for example for the << operators ) you should include a inline keyword to tell the compiler that it's ok to have the same definition coming from multiple sources.

Probably a good moment now that inlines came into the discussion, to remember that the inline keyword is not just a hint for the compiler to know when it can inline or not thins, it also manipulates where and how definitions of functions are allowed. In our case is important because:

An inline function or variable (since C++17) with external linkage (e.g. not declared static) has the following additional properties:

There may be more than one definition of an inline function in the program as long as each definition appears in a different translation unit and (for non-static inline functions) all definitions are identical. For example, an inline function may be defined in a header file that is included in multiple source files.

It must be declared inline in every translation unit. It has the same address in every translation unit.

The include should not be a problem by its own, but explicitly splitting the header_header and the header_sources, makes it impossible to control if you are attaching the right implementation to the proper header. Actually its pretty easy to break this rule: for instance imagine I am tempted to wrongly specialize a class (intended or by mistake) in my compilation unit (call it application):

CU1

// Header "MyTemplatedClass.h"
teamplate<class T1>
class MyTemplatedClass {
    int printNumber();
}

// SourceHeader "MyTemplatedClassSource.h"
templte <class T1>
inline int MyTemplatedClass<T1>::printNumber {print 1};

// Source "MyTemplatedClass.cpp"
include "MyTemplatedClass.h"
include "MyTemplatedClassSource.h"

template class MyTemplatedClass<int>;

CU2

// SourceHeader "MyTemplatedClassSource2.h"
templte <class T1>
inline int MyTemplatedClass<T1>::printNumber {print 2};

// Source "MyTemplatedClass.cpp"
include "MyTemplatedClass.h"
// Note that I decided to include another definition here 
// and nobody can do anything to stop me :)
include "MyTemplatedClassSource2.h"
template class MyTemplatedClass<double>;

CU2

// Random source file "Code.cpp"
include "MyTemplatedClass.h"
my = MyTemplatedClass<int>();
my.printNumber()

Both definitions are correct, you have told the compiler not to care about where the definition comes from, and you needed to include them both because you are now instantiating with another class. What does printNumber prints? Its impossible to know because you have no control about what code is being used, moreover, the results can vary from one compilation to another.

Edit: I didn't remember to reference this but is not a compile error, its undefined:

If an inline function with external linkage is defined differently in different translation units, the behavior is undefined.

What I mean is, while technically the solution is possible, and despite my personal preferences, it opens the door to nasty problems that are not easy to detect.

matekelemen commented 2 years ago

@roigcarlo I should probably not have mentioned the other PR because the specializations there are different for each type, my bad.

I think your example would result in an ODR violation (leading to a compilation error), though I'm not sure. My idea is that there should be only one implementation header (MyTemplatedClassSource2.h and the implementation it contains should not exist), and the specializations would not have individual implementations of their own.

I'll create a toy project we can play around with and link it here.

matekelemen commented 2 years ago

Here's a complete and working example of what I have in mind.

[Let me know if you'd like to play around with it and I'll give you collaborator access.]

Regardless of whether we're planning to do some refactoring, I think it's good to keep this possibility in mind for developing new stuff too.

roigcarlo commented 2 years ago

Hi @matekelemen as I said, it will work.

I was also preparing a small toy example to show what I mean with being dangerous

Basically this is an extension of what you said, but divided in several libraries that have dependencies among them, and all expose what they see fit.

Note that the code is clearly wrong, but the compiler just doesn't even throw a warning. Even more dangerous, depending on the order of linkage:

# target_link_libraries(TestODR1__ PUBLIC unit1)
./TestODR1__
Int:    1
Double: 1

# target_link_libraries(TestODR12_ PUBLIC unit1 unit2)
./TestODR12_
Int:    1
Double: 2

# target_link_libraries(TestODR12_ PUBLIC unit1 unit3)
./TestODR1_3
Int:    1
Double: 3

# target_link_libraries(TestODR123 PUBLIC unit1 unit2 unit3)
./TestODR123
Int:    1
Double: 2

# target_link_libraries(TestODR123 PUBLIC unit1 unit3 unit2)
./TestODR132
Int:    1
Double: 3

matekelemen commented 2 years ago

Ah I see what you mean now. That's quite nasty indeed, plus it doesn't even require the inline to compile; I'm surprised no compiler picks up on the ODR violation.