Lowering parallel in a target region in LLVM fails due to missing data values

ggeorgakoudis commented 1 year ago

Reporting a bug

[x] I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
[x] I have included a self contained code sample to reproduce the problem. i.e. it's possible to run as 'python bug.py'.

LLVM lowering fails with instruction domination errors when the parallel region is outlined. I'm attaching the error log and the original LLVM IR before lowering. Check values %.42 and %.67. They are allocas in the target region, used in the parallel region but do not appear in the data-sharing qualifiers, hence the error. I'm attaching files to help debug the issue: the python code, the error log, and the device image IR before lowering.

hello-target-parallel.py.txt error_log.txt device0f1.ll.txt

DrTodd13 commented 1 year ago

@ggeorgakoudis Please verify my fixes produce unique tags and then close this issue.

ggeorgakoudis commented 1 year ago

Verified, for the record the big issue was the missing call to post_lowering_openmp for GPU device targets, unique tags was a nit.

DrTodd13 commented 1 year ago

I'm not sure that is the right place for the post_lowering_openmp call either. It would seem to make sense to put it in the same spot (post_lowering) for cuda as we have for cpu. This seems to be one of the spots where making this an extension is going to be more difficult. If it works there for the time being fine we can make progress and consider this issue for a long-term fix. @stuartarchibald Any thoughts?

ggeorgakoudis commented 1 year ago

Agreed. It’s hack-ish right now, what we want medium-to-long term is an OpenMP context/target to have a clean extension.

Python-for-HPC / numbaWithOpenmp

Lowering parallel in a target region in LLVM fails due to missing data values #3

Reporting a bug