Open francoishamon opened 3 years ago
Do you see the same performance degradation without using __restrict__
references? It's possible that the alias analysis is working correctly within the same function, but being lost through the function call. Adding __restrict__
to a reference to the view sadly can't really help, that's telling the compiler that there is no other active pointer to the view but makes no guarantees about the data pointed to by the pointer it contains. Part of the problem with restrict in general is that it is not meant to work on class or struct members, only on function parameters and local variables. One way to be completely sure the qualifier is passed through would be to pass the pointers, appropriately marked, then produce the views in compute_pml_3d_restrict. Another that may work in some compilers, but is not guaranteed to work by any language standard, is to apply the restrict qualifier to the pointer type parameter to the View
.
There are also some utility wrapper types that attempt to work around this in RAJA/util/types.hpp
in a few different ways, but they are all attempts at working around a general limitation of portable C++, and I can't speak to their effectiveness.
Thanks for the quick reply and the clear explanation. Yes I confirm that I observe the same performance degradation without using the __restrict__
keyword when I pass the references to the RAJA::View
. Also, I just tried to pass restricted pointers to compute_pml_3d_restrict
to create the views there, and I can recover the good performance of the original code based on for loops. I will have a look at RAJA/util/types.hpp
as well. Thanks for your help.
Hello, I am testing RAJA in a small finite-difference code, and I encountered a problem related to the use of
RAJA::View
with the__restrict__
keyword on CPU. The file that implements the different versions of the kernels is here. The standard version of my main kernel looks like:where the FD macros are defined at the top of this file. The pointers used in the macros
LAP
,VUPDATE
, andPHIUPDATE
are passed with the__restrict__
keyword.For comparison, I also implemented in the same function another version of the kernel using
RAJA::View
andRAJA::kernel
withRAJA::loop_exec
that looks like:where the ranges are defined as
RAJA::RangeSegment const XRange(x3, x4);
, etc, and the views used in the macros are defined asRAJA::View< const float, RAJA::Layout<1, RAJA::Index_type, 0> > uView( u, (nx+2*lx)*(ny+2*ly)*(nz+2*lz) );
, etc. The standard for-loop version and thisRAJA::kernel
version have a very similar performance on CPU.My issue arose when I tried to create the
RAJA::View
s in one function, then passed them by reference with the__restrict__
keyword to another function like that:The
RAJA::kernel
is located in this other functioncompute_pml_3d_restrict
, but is exactly the same as before (same range, same policy). Please see this version of the code here. In this case, it seems that the__restrict__
keyword is ignored, and my code is about twice slower than the two other versions. Am I doing anything wrong here?I did these tests in the GEOSX environment (this branch) on Quartz, compiling in release with both clang-10.0.0 and gcc-8.1.0. Let me know if you need more information