Call post_lowering_openmp for GPU device code

ggeorgakoudis commented 1 year ago

Post lowering for OpenMP discovers allocas from outer regions that must be privatized in OpenMP tags. CPU contexts call post_lowering_openmp but GPU do not, hence the need for the explicitly call before applying the LLVM pass.

Fix #3

Reference an existing issue

ggeorgakoudis commented 1 year ago

I'm not sure how they can be duplicates with this approach since each value will have strictly one alloca due to SSA. I assume they are hashable since it works on local testing. I had a second look at the code and I'm not sure we need the set(). The if at https://github.com/Python-for-HPC/numbaWithOpenmp/blob/openmp_unpack_arrays/numba/openmp.py#L678 should guard against duplicates but doesn't. Does self.add_tag fail to update tag_vars?

DrTodd13 commented 1 year ago

I'm not sure how they can be duplicates with this approach since each value will have strictly one alloca due to SSA. I assume they are hashable since it works on local testing. I had a second look at the code and I'm not sure we need the set(). The if at https://github.com/Python-for-HPC/numbaWithOpenmp/blob/openmp_unpack_arrays/numba/openmp.py#L678 should guard against duplicates but doesn't. Does self.add_tag fail to update tag_vars?

That is the mechanism where I try to prevent duplicates but I have had issues in the past where tag_vars isn't updated. add_tag will update tag_vars IF a string can be extracted for the tag, which will happen if the tag is a Numba ir.Var or if the tag is itself a string. I just realized that is a problem and that I also need to check for alloca instructions there. Let me make that change.

DrTodd13 commented 1 year ago

I pushed the call to post_lowering_openmp. My previous push should have fixed the tag_vars set so uniqueness should be done that way. No need to change the alloca list into a set. Closing this PR as a result.

Python-for-HPC / numbaWithOpenmp

Call post_lowering_openmp for GPU device code #6

Reference an existing issue