Closed sguera closed 7 years ago
Best is when you put an ifdef arround each LIKWID call and the include of the LIKWID header:
#ifdef LIKWID_PERFMON
#include <likwid.h>
#endif
[...]
#ifdef LIKWID_PERFMON
LIKWID_MARKER_START("Sweep");
#endif
thanks @TomTheBear I'll do like that
@TomTheBear Does it need to be this way (with the omp parallel)?
#ifdef LIKWID_PERFMON
#pragma omp parallel
{
LIKWID_MARKER_START("Sweep");
}
#endif
If you want to call it outside of a parallel region, yes. The advantage is less overhead but there are also disadvantages when the calls are outside of a loop like no overflow recognition, no possibility to switch groups at runtime and of course no call count detection. INIT and CLOSE in serial regions THREADINIT, START, STOP, GET and SWITCH inside parallel regions.
to be discussed whether is preferable a code looking like this:
#ifdef LIKWID_PERFMON
#pragma omp parallel
{
LIKWID_MARKER_START("Sweep");
}
#endif
while (runtime < 0.5)
{
timing(&wct_start, &cput_start);
for (int n = 0; n < repeat; ++n)
{
kernel_loop(a, b, W);
tmp = a;
a = b;
b = a;
}
timing(&wct_end, &cput_end);
runtime = wct_end - wct_start;
repeat *= 2;
}
#ifdef LIKWID_PERFMON
#pragma omp parallel
{
LIKWID_MARKER_STOP("Sweep");
}
#endif
or like this:
while (runtime < 0.5)
{
timing(&wct_start, &cput_start);
for (int n = 0; n < repeat; ++n)
{
#pragma omp parallel
{
#ifdef LIKWID_PERFMON
LIKWID_MARKER_START("Sweep");
#endif
kernel_loop(a, b, W);
#ifdef LIKWID_PERFMON
LIKWID_MARKER_STOP("Sweep");
#endif
}
tmp = a;
a = b;
b = a;
}
timing(&wct_end, &cput_end);
runtime = wct_end - wct_start;
repeat *= 2;
}
Since you have the while loop that limits the runtime to 0.5 seconds or a single call of kernel_loop
, it is probably the better choice to put the calls outside of the while loop.
Another version with one region per repeat value would be (just for discussion):
char rname[100];
while (runtime < 0.5)
{
snprintf(rname, 99, "Sweep_%d_repeats",repeat);
#ifdef LIKWID_PERFMON
#pragma omp parallel
{
LIKWID_MARKER_START(rname);
}
#endif
timing(&wct_start, &cput_start);
for (int n = 0; n < repeat; ++n)
{
#pragma omp parallel
{
kernel_loop(a, b, W);
}
tmp = a;
a = b;
b = a;
}
timing(&wct_end, &cput_end);
#ifdef LIKWID_PERFMON
#pragma omp parallel
{
LIKWID_MARKER_STOP(rname);
}
#endif
runtime = wct_end - wct_start;
repeat *= 2;
}
In case I put it inside the while, but without your solution for having several names (with the array of chars + sprintf), It would be overwritten every time, so the only values I would get are the ones of the run with runtime > 0.5, isn't it? In which case would be fine.
I do not think it is correct if we get the counters from all the runs, which should happen in case I leave the LIKWID_MARKER_START("Sweep");
outside the while. Am I wrong?
If you don't change the names, the calls are accumulated not overwritten. So, you get the summed up values until the runtime is > 0.5.
Why shouldn't it be incorrect to get the counters of all runs? As long as you don't change the sizes and/or the algorithm, there is no difference in the runs except the runtime of each region call.
Yes, I know there is no difference but then you would get runtime of 1 run (the longest until runtime > 0.5) and the values of the counters as cumulative. Additionally also the number of repetitions and statistics are referred to a single run. That was my only "fear". Anyway I'll keep it outside for now. Thanks for contributing
Replace likwid calls by Macros:
by
this allows to compile the code without likwid being available