Closed ggeorgakoudis closed 1 year ago
@ggeorgakoudis Please verify my fixes produce unique tags and then close this issue.
Verified, for the record the big issue was the missing call to post_lowering_openmp
for GPU device targets, unique tags was a nit.
I'm not sure that is the right place for the post_lowering_openmp call either. It would seem to make sense to put it in the same spot (post_lowering) for cuda as we have for cpu. This seems to be one of the spots where making this an extension is going to be more difficult. If it works there for the time being fine we can make progress and consider this issue for a long-term fix. @stuartarchibald Any thoughts?
Agreed. It’s hack-ish right now, what we want medium-to-long term is an OpenMP context/target to have a clean extension.
Reporting a bug
LLVM lowering fails with instruction domination errors when the parallel region is outlined. I'm attaching the error log and the original LLVM IR before lowering. Check values
%.42
and%.67
. They are allocas in the target region, used in the parallel region but do not appear in the data-sharing qualifiers, hence the error. I'm attaching files to help debug the issue: the python code, the error log, and the device image IR before lowering.hello-target-parallel.py.txt error_log.txt device0f1.ll.txt