Open Quuxplusone opened 5 years ago
Bugzilla Link | PR42393 |
Status | NEW |
Importance | P normal |
Reported by | Ye Luo (xw111luoye@gmail.com) |
Reported on | 2019-06-25 14:16:39 -0700 |
Last modified on | 2020-04-06 06:30:34 -0700 |
Version | unspecified |
Hardware | PC Linux |
CC | a.bataev@hotmail.com, bruno.turcksin@gmail.com, jdoerfert@anl.gov, llvm-bugs@lists.llvm.org, terry.l.wilmarth@intel.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
The problem caused by kmpc_push_target_tripcount function. It is not threadsafe and has data race, which causes incorrect results.
(In reply to Alexey Bataev from comment #1)
> The problem caused by kmpc_push_target_tripcount function. It is not
> threadsafe and has data race, which causes incorrect results.
Also, seems to me, there is a problem with the runtime, will investigate this.
After some investigation, seems to me it is the problem in libomp. When we
schedule the distribute loop, the tid is taken from the outer threads (though
it should set to 0 in all cases) and the number of threads is taken from the
outer parallel region (though seems to me, it should be set to 1).
It would be good to check if libomp works correctly here.
I tried compiling this without offload
$clang++ -fopenmp debug.cpp
$ OMP_NUM_THREADS=2 ./a.out
tid = 0
0 1 0 0
tid = 1
0 0 2 3
Probably something is wrong with libomp already.
On the host we generate a omp for
like loop for the distribute
which binds to the outer parallel. Thus, each thread of the outer parallel executes only a single iteration of the A[i] = i
loop instead of the entire thing.
(In reply to Johannes Doerfert from comment #5)
On the host we generate a
omp for
like loop for thedistribute
which binds to the outer parallel. Thus, each thread of the outer parallel executes only a single iteration of theA[i] = i
loop instead of the entire thing.
If omp for
like loop is generated for distribute
, omp parallel
like region should be generated for teams
. binding to the outer parallel is clearly wrong.
(In reply to Ye Luo from comment #6)
(In reply to Johannes Doerfert from comment #5)
On the host we generate a
omp for
like loop for thedistribute
which binds to the outer parallel. Thus, each thread of the outer parallel executes only a single iteration of theA[i] = i
loop instead of the entire thing.If
omp for
like loop is generated fordistribute
,omp parallel
like region should be generated forteams
. binding to the outer parallel is clearly wrong.
Agreed.