Open jgfouca opened 2 years ago
Upon a second look, nlev%pack_size !=0
is not necessary to demonstrate the error but it does make the errors more frequent. When I switched to ExeSpaceUtils::parallel_reduce
on scalarized views, I had similar problems until I made sure each thread had a local variable passed to the reducer. The local variable approach also fixed view_reduction
.
I should also note that the error only occurs when team_size > 1, which is what you get when MIMIC_GPU is On (team size 7) which is On by default for Debug builds.
I believe the problems with ExeSpaceUtils::parallel_reduce
were not fixed. @bartgol , correct me if I'm wrong.
Yes, you're right. I was working on completing last Friday, but did not finish by week's end. I should be done today.
I think this was completed in #258. Closing.
Describe the bug This was discovered when porting shoc_energy_integrals to small kernels. I was getting large differences in the outputs of the view_reductions when num_threads>1. I suspect the problem is in the handling of the garbage of the last pack because the problem went away when I used nlev % pack_size = 0.
To Reproduce Steps to reproduce the behavior:
-DSCREAM_SMALL_KERNELS=On -DCMAKE_BUILD_TYPE=Debug
OMP_NUM_THREADS=16 ./shoc_tests shoc_main_bfb
Expected behavior view_reduction should have produced bfb results with fortran.