bh107 / bohrium

Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX
http://www.bh107.org
Apache License 2.0
220 stars 31 forks source link

Significant memory leaks in recent versions #601

Closed dionhaefner closed 5 years ago

dionhaefner commented 5 years ago

Memory consumption when running Veros through NumPy:

memory_numpy

Memory consumption when running Veros through Bohrium:

memory_bh

I'll try and isolate the problem, I'll add any information I can find to this issue.

Platform

dionhaefner commented 5 years ago

The biggest culprit is this function:

import numpy as np

def interpolate_along_axis(coords, arr, interp_coords, axis=0):
    if coords.ndim == 1:
        if len(coords) != arr.shape[axis]:
            raise ValueError(
                "Coordinate shape must match array shape along axis")
    elif coords.ndim == arr.ndim:
        if coords.shape != arr.shape:
            raise ValueError("Coordinate shape must match array shape")
    else:
        raise ValueError("Coordinate shape must match array dimensions")

    if axis != 0:
        arr = np.moveaxis(arr, axis, 0)
        coords = np.moveaxis(coords, axis, 0)
        interp_coords = np.moveaxis(interp_coords, axis, 0)

    diff = coords[np.newaxis, :, ...] - interp_coords[:, np.newaxis, ...]
    diff_m = np.where(diff <= 0., np.abs(diff), np.inf)
    diff_p = np.where(diff > 0., np.abs(diff), np.inf)
    i_m = np.asarray(np.argmin(diff_m, axis=1))
    i_p = np.asarray(np.argmin(diff_p, axis=1))
    mask = np.all(np.isinf(diff_m), axis=1)
    i_m[mask] = i_p[mask]
    mask = np.all(np.isinf(diff_p), axis=1)
    i_p[mask] = i_m[mask]
    full_shape = (slice(None),) + (np.newaxis,) * (arr.ndim - 1)
    if coords.ndim == 1:
        i_p_full = i_p[full_shape] * np.ones(arr.shape)
        i_m_full = i_m[full_shape] * np.ones(arr.shape)
    else:
        i_p_full = i_p
        i_m_full = i_m
    ii = np.indices(i_p_full.shape)
    i_p_slice = (i_p_full,) + tuple(ii[1:])
    i_m_slice = (i_m_full,) + tuple(ii[1:])
    dx = (coords[i_p_slice] - coords[i_m_slice])
    pos = np.where(
        dx == 0., 0., (coords[i_p_slice] - interp_coords) / (dx + 1e-12))
    return np.moveaxis(arr[i_p_slice] * (1. - pos) + arr[i_m_slice] * pos, 0, axis)

if __name__ == "__main__":
    print("")

    for i in range(10):
        a, b, c = (np.random.rand(150, 150) for _ in range(3))

        res = interpolate_along_axis(a, b, c, 1)

        try:
            np.flush()
        except AttributeError:
            pass

        print(i, end="\r")

bh_leak

The argmin calls sync to NumPy which seems to be the cause of the memory leak.

dionhaefner commented 5 years ago

I was able to condense it to this:

import bohrium as np

def leak(a, b):
    diff = a[np.newaxis, :, ...] - b[:, np.newaxis, ...]
    mask = diff > 0
    diff[mask] = np.inf
    return

if __name__ == "__main__":
    print("")

    for i in range(10):
        a, b = (np.random.rand(150, 150) for _ in range(2))

        res = leak(a, b)
        np.flush()

        print(i, end="\r")

Executing this causes some kernels to be compiled over and over again. Swapping np.inf for a finite value makes the leak disappear.

dionhaefner commented 5 years ago

Interesting that this was caused by the malloc cache. But I guess this only explains the rising memory consumption, not why the kernels were being recompiled in every iteration?