TomographicImaging / CIL

A versatile python framework for tomographic imaging
https://tomographicimaging.github.io/CIL/
Apache License 2.0
87 stars 40 forks source link

KL numba performance warning #1840

Open KrisThielemans opened 1 week ago

KrisThielemans commented 1 week ago

https://github.com/SyneRBI/SIRF-SuperBuild/actions/runs/9569076152/job/26380772313#step:14:1264

....../home/runner/virtualenv/lib/python3.10/site-packages/numba/parfors/parfor.py:2395: NumbaPerformanceWarning: 
prange or pndindex loop will not be executed in parallel due to there being more than one entry to or exit from the loop (e.g., an assertion).

File "../../../../../../../../../install/python/cil/optimisation/functions/KullbackLeibler.py", line 340:
    def kl_div(x, y, eta):
        <source elided>
        accumulator = 0.
        for i in prange(x.size):
        ^

  warnings.warn(
/home/runner/virtualenv/lib/python3.10/site-packages/numba/core/typed_passes.py:336: NumbaPerformanceWarning: 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.readthedocs.io/en/stable/user/parallel.html#diagnostics for help.

File "../../../../../../../../../install/python/cil/optimisation/functions/KullbackLeibler.py", line 338:
    @njit(parallel=True)
    def kl_div(x, y, eta):
epapoutsellis commented 1 week ago

This is due to #1549

@jit(parallel=True, nopython=True)
def kl_div_flat_mask(x, y, eta, mask):
    accumulator = 0.
    for i in prange(x.size):
        if mask.flat[i] > 0:
            X = x.flat[i]
            Y = y.flat[i] + eta.flat[i]
            if X > 0 and Y > 0:
                # out.flat[i] = X * numpy.log(X/Y) - X + Y
                accumulator += X * numpy.log(X/Y) - X + Y
            elif X == 0 and Y >= 0:
                # out.flat[i] = Y
                accumulator += Y
            else:
                # out.flat[i] = numpy.inf
                return numpy.inf
    return accumulator

@njit(parallel=True, fastmath=True)
def kl_div_ravel_mask(x, y, eta, ind):    
    accumulator = 0.0
    x = x.ravel()
    y = y.ravel()
    eta = eta.ravel()
    for i in prange(ind.size):
        tmp = y[i] + eta[i]
        if x[i] > 0 and tmp > 0:
            accumulator += x[i] * numpy.log(x[i] / tmp) - x[i] + tmp
        elif x[i] == 0 and tmp >= 0:
            accumulator += tmp
        else:        
            accumulator = numpy.inf  
    return accumulator 

2nd implementation is faster without any warnings. I can fix it if you want