SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
625 stars 283 forks source link

AreaWeighted regrid requires a lot of memory #3808

Open bouweandela opened 4 years ago

bouweandela commented 4 years ago

Regridding a cube of a few hundred MB to another cube of similar size requires about 10 GB of RAM. This seems a bit excessive. It also means that without lazy regridding (#3700, #3701), the area weighted regridder cannot be used for all but the smallest datasets.

Example script to reproduce the issue:

import iris
import numpy as np

def create_cube(shape):

    times = iris.coords.DimCoord(np.arange(shape[0]), standard_name="time")
    lats = iris.coords.DimCoord(
        np.linspace(0, 180, shape[1], endpoint=True),
        standard_name="latitude",
        units="degrees_north",
    )
    lats.guess_bounds()
    lons = iris.coords.DimCoord(
        np.linspace(0, 360, shape[2], endpoint=False),
        standard_name="longitude",
        units="degrees_east",
    )
    lons.guess_bounds()

    data = np.ones(shape, dtype=np.float32)

    coords_spec = [(times, 0), (lats, 1), (lons, 2)]
    cube = iris.cube.Cube(data, dim_coords_and_dims=coords_spec)

    return cube

if __name__ == "__main__":

    src = create_cube((1000, 193, 384))
    grid = create_cube((1, 181, 360))

    scheme = iris.analysis.AreaWeighted()

    tgt = src.regrid(grid, scheme)

    MB = 2 ** 20
    ssize = np.prod(src.shape) * src.dtype.itemsize / MB
    tsize = np.prod(tgt.shape) * tgt.dtype.itemsize / MB
    print(f"Regridded {src.shape} of {ssize} MB to {tgt.shape} of {tsize} MB")

Saving the code above to a file called area_weighted.py and running it as a script:

$ \time -v python area_weighted.py
Regridded (1000, 193, 384) of 282.71484375 MB to (1000, 181, 360) of 248.565673828125 MB
    Command being timed: "python area_weighted.py"
    User time (seconds): 8.20
    System time (seconds): 2.34
    Percent of CPU this job got: 102%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.31
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 9328856
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 1
    Minor (reclaiming a frame) page faults: 1488511
    Voluntary context switches: 40
    Involuntary context switches: 228
    Swaps: 0
    File system inputs: 40
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

i.e. it used at most 9328856 KB = almost 10 GB of RAM.

github-actions[bot] commented 2 years ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

bouweandela commented 2 years ago

This is still important to us

github-actions[bot] commented 1 year ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

ESadek-MO commented 1 year ago

5365 might be relevant here?