Python-for-HPC / PyOMP

OpenMP for Python in Numba
BSD 2-Clause "Simplified" License
73 stars 14 forks source link

Openmp loops with non constant size? #12

Open elyurn opened 6 days ago

elyurn commented 6 days ago

A typical example of hybrid OpenMP+MPI code is a simple computation of Pi using Riemman integral. I implemented it with PyOMP and numba-mpi, but it seems that we are limited by Numba (Only constant step size is supported for prange) :

import numba_mpi as nbmpi, numba as nb, numpy as np
from numba.openmp import openmp_context as openmp
from numba.openmp import omp_get_num_threads, omp_get_thread_num

@nb.njit
def get_pi_part(n_intervals=10000, rank=0, size=1): 
    h = 1.0/n_intervals # width of each interval
    partial_sum = 0.0 # initialize the partial sum
    with openmp("parallel for reduction(+:partial_sum)"):
        for i in range(rank+1, n_intervals, size): # loop over the intervals
            x = h*(i-0.5) # x value at the center of the interval
            partial_sum += 4.0/(1.0 + x**2) # add the height of the rectangle to the partial sum
    return h*partial_sum # return the partial sum

@nb.njit
def pi_numba_mpi(n_intervals=10000):
    pi = np.zeros(1)
    part = np.empty_like(pi)
    part[0] = get_pi_part(n_intervals, nbmpi.rank(), nbmpi.size()) # get the partial sum
    nbmpi.allreduce(part, pi) # sum the partial sums
    return pi[0]

pi_result = pi_numba_mpi()

Error:

Traceback (most recent call last):
  File "/home/usr/miniconda3/envs/pyomp/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/usr/miniconda3/envs/pyomp/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/usr/projectPhys743/Pi_allreduce/hello_world_mpi_openmp.py", line 25, in <module>
    pi_result = pi_numba_mpi()
  File "/home/usr/miniconda3/envs/pyomp/lib/python3.10/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/home/usr/miniconda3/envs/pyomp/lib/python3.10/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x7f8b17e0e6e0>.
Failed in nopython mode pipeline (step: Handle with contexts)
Only constant step size is supported for prange
During: resolving callee type: type(CPUDispatcher(<function get_pi_part at 0x7f8b3bf0a9e0>))
During: typing of call at /home/usr/projectPhys743/Pi_allreduce/hello_world_mpi_openmp.py (21)

Enable logging at debug level for details.

File "hello_world_mpi_openmp.py", line 21:
def pi_numba_mpi(n_intervals=10000):
    <source elided>
    part = np.empty_like(pi)
    part[0] = get_pi_part(n_intervals, nbmpi.rank(), nbmpi.size()) # get the partial sum

Would you have a workaround for such usage of PyOMP?

ggeorgakoudis commented 4 days ago

The error message Only constant step size is supported for prange is misleading. In reality we have put this restriction ourselves in PyOMP code generation (https://github.com/Python-for-HPC/numbaWithOpenmp/blob/296fb858b0a6800323e9b46b027c32636ecc80c7/numba/openmp.py#L3752), which is unrelated to prange. I'm tracking this in python-for-hpc/numbaWithOpenmp#39

PyOMP can and will support non-constant steps in parallel for. I'm tracking this in this issue python-for-hpc/numbaWithOpenMP#40. We plan to support that in the next bugfix release.

As a workaround you can refactor the parallel loop to step size 1 and do some extra work in the loop body:

with openmp("parallel for reduction(+:partial_sum)"):
    for i in range(rank+1, n_intervals): # loop over the intervals
        i = i*size
        if i<n_intervals:
            x = h*(i-0.5) # x value at the center of the interval
            partial_sum += 4.0/(1.0 + x**2) # add the height of the rectangle to the partial sum
return h*partial_sum # return the partial sum

Let me know if that works (it should).

I will keep this issue open until we fix code generation in PyOMP.

elyurn commented 4 days ago

Nice, thank you for the support. It indeed works, although the resulting integral is false using your code (bad indices). FYI, the solution to my specific problem:

@nb.njit
def get_pi_part(n_intervals=10000, rank=0, size=1): 
    h = 1.0/n_intervals # width of each interval
    partial_sum = 0.0 # initialize the partial sum
    with openmp("parallel for reduction(+:partial_sum)"):
        # for i in range(rank+1, n_intervals): # loop over the intervals
        for i in range((n_intervals-rank+size-2)//size): # loop over the intervals
            i = i*size + rank+1
            if i<n_intervals:
                x = h*(i-0.5) # x value at the center of the interval
                partial_sum += 4.0/(1.0 + x**2) # add the height of the rectangle to the partial sum
    return h*partial_sum # return the partial sum
ggeorgakoudis commented 2 days ago

Glad you worked it out, fixing my buggy version