ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

preprocessor._regrid._vertical_interpolate memory issue #123

Closed ledm closed 5 years ago

ledm commented 5 years ago

Hi,

I've encountered a bug in the function _vertical_interpolate in preprocessor._regrid. I think that the problem is that this function uses numpy arrays instead of dask arrays.

def _vertical_interpolate(cube, levels, interpolation, extrapolation):
    """Perform vertical interpolation."""
    # Determine the source levels and axis for vertical interpolation.
    src_levels = cube.coord(axis='z', dim_coords=True)
    z_axis, = cube.coord_dims(src_levels)

    # Broadcast the 1d source cube vertical coordinate to fully
    # describe the spatial extent that will be interpolated.
    broadcast_shape = cube.shape[z_axis:]
    reshape = [1] * len(broadcast_shape)
    reshape[0] = cube.shape[z_axis]
    src_levels_reshaped = src_levels.points.reshape(reshape)
    src_levels_broadcast = np.broadcast_to(src_levels_reshaped,
                                           broadcast_shape)

    # force mask onto data as nan's
    if np.ma.is_masked(cube.data):
        cube.data[cube.data.mask] = np.nan

    # Now perform the actual vertical interpolation.
    new_data = stratify.interpolate(
        levels,
        src_levels_broadcast,
        cube.data,
        axis=z_axis,
        interpolation=interpolation,
        extrapolation=extrapolation)

    # Calculate the mask based on the any NaN values in the interpolated data.
    mask = np.isnan(new_data)

    if np.any(mask):
        # Ensure that the data is masked appropriately.
        new_data = np.ma.array(new_data, mask=mask, fill_value=_MDI)

    # Construct the resulting cube with the interpolated data.
    return _create_cube(cube, new_data, levels.astype(float))

The error that I'm seeing is:

MemoryError: Failed to realise the lazy data as there was not enough memory available.
The data shape would have been (156, 40, 404, 802) with dtype('float32').
valeriupredoi commented 5 years ago

this should not happen normally, the regdridder (vertical or horizontal) uses roughly as much memory as the size of the data per level x 2 so in this case about 1.6GB x 2 (since it interpolates between two levels), if you don't have ~4GB of free RAM then you end up with the error, where are you running?

ledm commented 5 years ago

Okay, I was running this on jasmin sci1.

bouweandela commented 5 years ago

Duplicate of #35