I was working on clang-tidy checks during the hack session today and ran into something interesting I think we should discuss. The clang-analyzer-core.UndefinedBinaryOperatorResult check checks for cases where a variable might be undefined when it's used and so structures like this tend to trigger it:
int o1, o2, o3;
switch (dir) {
case 0:
o1 = grid_enum::momentum_x;
o2 = grid_enum::momentum_y;
o3 = grid_enum::momentum_z;
break;
case 1:
o1 = grid_enum::momentum_y;
o2 = grid_enum::momentum_z;
o3 = grid_enum::momentum_x;
break;
case 2:
o1 = grid_enum::momentum_z;
o2 = grid_enum::momentum_x;
o3 = grid_enum::momentum_y;
break;
}
// the check flags the usage of o1 on this line
velocity_x = dev_conserved[o1 * n_cells + id];
It is technically correct to flag this since o1 could be uninitialized if dir is not 0-2 even though we know it always will be in that range. An easy way to fix this is if all the kernels with a structure like this take dir as a template argument instead of a regular argument, thus allowing the switch statement to be determined at compile time.
While I don't think it's worth doing that just for this check I tried it out on a single kernel (the new PPMC_VL kernel) since templates can improve the performance of control statements by evaluating them at compile time. While I didn't expect it to help it actually did help. It reduced the execution time by ~4ms per time step, a ~2% improvement. If these results hold when applied to other kernels that could be an easy 10% performance improvement for very little effort. What do you all think?
I was working on clang-tidy checks during the hack session today and ran into something interesting I think we should discuss. The
clang-analyzer-core.UndefinedBinaryOperatorResult
check checks for cases where a variable might be undefined when it's used and so structures like this tend to trigger it:It is technically correct to flag this since
o1
could be uninitialized ifdir
is not 0-2 even though we know it always will be in that range. An easy way to fix this is if all the kernels with a structure like this takedir
as a template argument instead of a regular argument, thus allowing the switch statement to be determined at compile time.While I don't think it's worth doing that just for this check I tried it out on a single kernel (the new PPMC_VL kernel) since templates can improve the performance of control statements by evaluating them at compile time. While I didn't expect it to help it actually did help. It reduced the execution time by ~4ms per time step, a ~2% improvement. If these results hold when applied to other kernels that could be an easy 10% performance improvement for very little effort. What do you all think?