HPCE / hpce-2017-cw6

2 stars 17 forks source link

Passing by reference or copy in a lambda function of a task_group #24

Closed natoucs closed 6 years ago

natoucs commented 6 years ago

Say I have a variable p set in global memory. I pass p to many lambda functions run in parallel (using a .run in a task_group). p is read-only in all those functions.

Should I pass p as a reference or as a copy ?

If I pass as a copy, I suffer the cost of copying but I won't ever have a thread waiting because another thread is currently reading the variable.

If I pass by reference, I am in danger of two threads clashing to read and losing time in the process but on an other hand, p is only used once in each function so what are the odds that the threads try to read it at the same time..

What is a good rule of thumb to know the answer to those kind of questions without having to implement and compare the times ?

For now, it seems better to use a copy as the cost of copying from global memory to a local thread is cheap VS the cost of a clash is well-known to waste more time. But I am not sure, need confirmation.

m8pple commented 6 years ago

If the variable p is read-only in all tasks, then you shouldn't get problems with caches fighting over the variable. It is completely fine if multiple CPUs have a cached copy of the same variable, as long as they all have exactly the same value and never change it. So from that point of view there is little performance impact, as eventually every CPU will have a local private copy in L1 cache.

However, there is a good argument for copying the value p from the compilers point of view, as it may affect the ability of the compiler to optimise expressions and statements involving that variable. If the variable is copied by value, then the compiler knows that the variable is completely private, and can rely on that value never changing. However, if the variable is shared, then the compiler has to be able to prove that no other piece of code will modify the variable when the task is executed. This is often very hard for the compiler, so it may not be able to do the same optimisations.

Generally speaking, unless you have a value that is expensive to copy (like a vector or an array), it is often a good idea to capture scalars by value, either using [=], or by manually copying into a local variable. This makes it clear to both the compiler and the programmer that the variables are explicitly local, and constant for the lifetime of the lambda.

It's quite related to the idea from #16, where the compiler may find it difficult to optimise things across lambda/task boundaries.

natoucs commented 6 years ago

Thanks for the detailed explanation!

So for the first paragraph, you mean that if I pass by reference to multiple CPUs, eventually the variable will be in their respective cache as a local private copy ? Is it the compiler that optimises this (seeing it is read-only) or is it a natural way things work ?

m8pple commented 6 years ago

Yes, that's correct, but it will be the CPU that manages the sharing. So at run-time it will realise that all the variables are being used in a read-only sense.

If you can transform it so that the compiler can realise, then the optimisation will be done at compile-time (which is more efficient).

natoucs commented 6 years ago

Passing by value in a lambda makes the variable read-only (const) so it will tell the compiler this is read-only at compile-time normally. (see: https://stackoverflow.com/questions/27714616/why-are-lambda-arguments-passed-by-value-read-only-in-c11)