JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

High memory usage #53

Open Plantain opened 5 years ago

Plantain commented 5 years ago

The memory usage of xESMF seems quite high, higher than standalone ESMF CLI, and I wonder if this is due to a possible leak, as loading from a saved weights file uses much less RAM? Additionally, perhaps it can be improved by using lower precision (32bit vs 64bit). I also note it doesn't seem possible to destroy a Regridder object to free memory?

import xesmf
import numpy as np
import gc
@profile
def test():
    src_ds = {'lat': np.arange(29.5,70.5,0.05), 'lon': np.arange(-23.5,45.0,0.05)} 
    dst_ds = xesmf.util.grid_2d(29, 70, 0.03, -23, 45, 0.03)
    regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear')
    regridder = None
    gc.collect()
    print("collected")
test()
python3 -m memory_profiler memtest.py
Line #    Mem usage    Increment   Line Contents
================================================
     4   65.500 MiB   65.500 MiB   @profile
     5                             def test():
     6   65.500 MiB    0.000 MiB       src_ds = {'lat': np.arange(29.5,70.5,0.05), 'lon': np.arange(-23.5,45.0,0.05)} 
     7  160.461 MiB   94.961 MiB       dst_ds = xesmf.util.grid_2d(29, 70, 0.03, -23, 45, 0.03)
     8 1246.199 MiB 1085.738 MiB       regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear')
     9 1246.203 MiB    0.004 MiB       regridder = None
    10 1246.203 MiB    0.000 MiB       gc.collect()
    11 1246.203 MiB    0.000 MiB       print("collected")
JiaweiZhuang commented 5 years ago

Thanks for reporting this issue. I've been wanting to diagnose the memory problem for a long time, and have just taken a closer look at this.

As background knowledge, ESMPy relies on the explicit destroy() call to release the Fortran array memory, for almost every ESMF object. I have definitely released the memory after regridder construction (code), but there still seems to be uncleaned, module-level memory allocations. The next version of ESMPy (v8.0.0) adds a new ESMF.Manager().destroy() call which should further clean-up the memory.

The higher-level xesmf.Regridder API is almost just a SciPy sparse matrix, so the garbage collection would work as for normal NumPy/SciPy objects.

If you use xesmf.Regridder(src_ds, dst_ds, 'bilinear', reuse_weights=True), the memory usage will be much lower because it doesn't involve ESMPy calls.

More details

To demonstrate that the memory issue comes from the underlying ESMPy calls, consider this esmpy_memory.py script:

"""A minimum script to test ESMPy memory allocation."""
import numpy as np
import ESMF
from memory_profiler import profile

def create_grid(shape):
    grid = ESMF.Grid(np.array(shape),
                     staggerloc = ESMF.StaggerLoc.CENTER,
                     coord_sys = ESMF.CoordSys.SPH_DEG)

    return grid

def fill_grid(grid, lons, lats):
    lon_pointer = grid.get_coords(coord_dim=0, 
                                  staggerloc=ESMF.StaggerLoc.CENTER)
    lat_pointer = grid.get_coords(coord_dim=1, 
                                  staggerloc=ESMF.StaggerLoc.CENTER)
    lon_pointer[:] = lons
    lat_pointer[:] = lats

@profile
def test_esmpy():
    # define test grids
    lons_in, lats_in = np.meshgrid(
        np.arange(-120, 120, 0.4), 
        np.arange(-60, 60, 0.3)
        )

    lons_out, lats_out = np.meshgrid(
        np.arange(-120, 120, 0.6), 
        np.arange(-60, 60, 0.4)
        )

    # build ESMPy regridder
    sourcegrid = create_grid(lons_in.shape)
    destgrid = create_grid(lons_out.shape)

    fill_grid(sourcegrid, lons_in, lats_in)
    fill_grid(destgrid, lons_out, lats_out)

    sourcefield = ESMF.Field(sourcegrid)
    destfield = ESMF.Field(destgrid)

    regrid = ESMF.Regrid(sourcefield, destfield, filename=None,
                         regrid_method=ESMF.RegridMethod.BILINEAR,
                         unmapped_action=ESMF.UnmappedAction.IGNORE)

    # release underlying Fortran memory
    sourcegrid.destroy()
    destgrid.destroy()
    sourcefield.destroy()
    destfield.destroy()
    regrid.destroy()

    # de-reference Python objects
    sourcegrid = None
    destgrid = None
    sourcefield = None
    destfield = None
    regrid = None

    lons_in = None
    lats_in = None
    lons_out = None
    lats_out = None

if __name__ == '__main__':
    test_esmpy()

python -m memory_profiler esmpy_memory.py generates:

Filename: esmpy_memory.py

Line #    Mem usage    Increment   Line Contents
================================================
    21     59.7 MiB     59.7 MiB   @profile
    22                             def test_esmpy():
    23                                 # define test grids
    24     59.7 MiB      0.0 MiB       lons_in, lats_in = np.meshgrid(
    25     59.7 MiB      0.0 MiB           np.arange(-120, 120, 0.4), 
    26     63.6 MiB      3.8 MiB           np.arange(-60, 60, 0.3)
    27                                     )
    28                             
    29     63.6 MiB      0.0 MiB       lons_out, lats_out = np.meshgrid(
    30     63.6 MiB      0.0 MiB           np.arange(-120, 120, 0.6), 
    31     65.4 MiB      1.8 MiB           np.arange(-60, 60, 0.4)
    32                                     )
    33                             
    34                                 # build ESMPy regridder
    35     76.3 MiB     11.0 MiB       sourcegrid = create_grid(lons_in.shape)
    36     78.4 MiB      2.1 MiB       destgrid = create_grid(lons_out.shape)
    37                                 
    38     78.4 MiB      0.0 MiB       fill_grid(sourcegrid, lons_in, lats_in)
    39     78.4 MiB      0.0 MiB       fill_grid(destgrid, lons_out, lats_out)
    40                             
    41     78.4 MiB      0.0 MiB       sourcefield = ESMF.Field(sourcegrid)
    42     78.4 MiB      0.0 MiB       destfield = ESMF.Field(destgrid)
    43                             
    44     78.4 MiB      0.0 MiB       regrid = ESMF.Regrid(sourcefield, destfield, filename=None,
    45     78.4 MiB      0.0 MiB                            regrid_method=ESMF.RegridMethod.BILINEAR,
    46    434.2 MiB    355.8 MiB                            unmapped_action=ESMF.UnmappedAction.IGNORE)
    47                             
    48                                 # release underlying Fortran memory
    49    430.8 MiB      0.0 MiB       sourcegrid.destroy()
    50    430.8 MiB      0.0 MiB       destgrid.destroy()
    51    430.8 MiB      0.0 MiB       sourcefield.destroy()
    52    430.8 MiB      0.0 MiB       destfield.destroy()
    53    390.2 MiB      0.0 MiB       regrid.destroy()
    54                             
    55                                 # de-reference Python objects
    56    390.2 MiB      0.0 MiB       sourcegrid = None
    57    390.2 MiB      0.0 MiB       destgrid = None
    58    390.2 MiB      0.0 MiB       sourcefield = None
    59    390.2 MiB      0.0 MiB       destfield = None
    60    390.2 MiB      0.0 MiB       regrid = None
    61                                 
    62    388.3 MiB      0.0 MiB       lons_in = None
    63    386.5 MiB      0.0 MiB       lats_in = None
    64    385.6 MiB      0.0 MiB       lons_out = None
    65    384.7 MiB      0.0 MiB       lats_out = None

The regrid.destroy() call slightly reduced the memory usage, but not too much. This memory profiling result should be correct, as free -h or docker stats reports a similar memory usage.

I am going to test the new module-level ESMF.Manager().destroy() to see if it improves things.

JiaweiZhuang commented 5 years ago

So it seems like ESMF.Manager().destroy() is still not implemented in the latest version of ESMF (just checked with ESMF_8_0_0_beta_snapshot_40 built by this script). Fortunately it has a __del__() method. For most objects, __del__() simply calls destroy(), for example see ESMF.Grid.

I added this extra code to the end of my original test script:

mg = ESMF.Manager()
mg.__del__()

Then, memory_profiler gives:

    69    384.5 MiB      0.0 MiB       mg = ESMF.Manager()
    70    201.6 MiB      0.0 MiB       mg.__del__()

So __del__() frees half of the memory, but still not all.

This top-level destroy also has serious side-effect: later attempts to build new regridders will lead to Segmentation fault, because we have lost connection to the Fortran internal .

JiaweiZhuang commented 5 years ago

Still, my current suggestion is to restart the kernel and load existing weights, if memory usage becomes a problem.

I will need to check with the ESMF team on the proper use of __del__()/destroy().

Plantain commented 5 years ago

How do we restart the kernel with the xESMF API? Or should that not leak memory?

JiaweiZhuang commented 5 years ago

How do we restart the kernel with the xESMF API?

I mean restart Python kernel, and set reuse_weights=True to load the regridder you generated previously

Plantain commented 5 years ago

That doesn't seem to behave as I expected, it still seems the regridder is never free'd.

Line #    Mem usage    Increment   Line Contents
================================================
     4   65.508 MiB   65.508 MiB   @profile
     5                             def test():
     6   65.508 MiB    0.000 MiB       src_ds = {'lat': np.arange(29.5,70.5,0.05,dtype=np.float32), 'lon': np.arange(-23.5,45.0,0.05,dtype=np.float32)} 
     7  160.383 MiB   94.875 MiB       dst_ds = xesmf.util.grid_2d(np.float32(29), np.float32(70), np.float32(0.03), np.float32(-23), np.float32(45), np.float32(0.03))
     8 1246.383 MiB 1086.000 MiB       regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear', filename="out/weights")
     9 1246.383 MiB    0.000 MiB       regridder = None
    10 1246.387 MiB    0.004 MiB       gc.collect()
    11 1270.051 MiB   23.664 MiB       regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear', reuse_weights=True, filename="out/weights")
    12 1222.730 MiB    0.000 MiB       dst_ds = None
    13 1222.730 MiB    0.000 MiB       src_ds = None
    14 1222.730 MiB    0.000 MiB       gc.collect()
    15 1222.730 MiB    0.000 MiB       print("done")
JiaweiZhuang commented 5 years ago

@Plantain Remove the first xesmf.Regridder() call in your test script.

bolliger32 commented 4 years ago

@JiaweiZhuang @Plantain curious if any more work has been done on this. We just encountered this issue when trying to run repeated tasks using different regridders with reuse_weights=True. Even if we never make calls to xesmf.Regridder without reuse_weights=True, our memory use builds with each call to build a new regridder from a saved file (even if we bring the previous regridder out of the namespace, e.g. by loading each regridder to the same variable name or calling del regridder).

JiaweiZhuang commented 4 years ago

our memory use builds with each call to build a new regridder from a saved file

The memory use increases by how much?

With reuse_weights=True, there is no call to ESMF.Regrid(), so the huge 400 MB allocation won't occur. (see https://github.com/JiaweiZhuang/xESMF/issues/53#issuecomment-511157349). 0.2.0 can still have a ~10 MB memory leak due to ESMF grid objects, but it should be fixed in 0.2.1 (https://github.com/JiaweiZhuang/xESMF/commit/9963d9566ce7138c67ee6d84ee13454e36a3ebe7)

75 should completely solve this problem. The new load_regridder() call won't involve any call into the ESMF module at all.

bolliger32 commented 4 years ago

Here's an example where I load a series of regridder files and then go back to the first regridder file. And the memory use keeps expanding (for the most part). Does this seem unexpected to you?:

Line #    Mem usage    Increment   Line Contents
================================================
     4   2085.2 MiB   2085.2 MiB   def test(srtm_tile_ds, ds_out_grid, regridder_files):
     5   2085.2 MiB      0.0 MiB       gc.collect()
     6   2224.0 MiB    138.8 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][0],'bilinear', filename=str(regridder_files[0]), reuse_weights=True)
     7   2224.0 MiB      0.0 MiB       gc.collect()
     8   2321.3 MiB     97.3 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][1],'bilinear', filename=str(regridder_files[1]), reuse_weights=True)
     9   2321.3 MiB      0.0 MiB       gc.collect()
    10   2377.0 MiB     55.7 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][2],'bilinear', filename=str(regridder_files[2]), reuse_weights=True)
    11   2377.0 MiB      0.0 MiB       gc.collect()
    12   2432.6 MiB     55.6 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][3],'bilinear', filename=str(regridder_files[3]), reuse_weights=True)
    13   2432.6 MiB      0.0 MiB       gc.collect()
    14   2377.1 MiB      0.0 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][0],'bilinear', filename=str(regridder_files[0]), reuse_weights=True)
    15   2377.1 MiB      0.0 MiB       gc.collect()
    16   2488.1 MiB    111.1 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][1],'bilinear', filename=str(regridder_files[1]), reuse_weights=True)
    17   2488.1 MiB      0.0 MiB       gc.collect()
    18   2488.1 MiB      0.0 MiB       return None
JiaweiZhuang commented 4 years ago

load a series of regridder files and then go back to the first regridder file.

Interesting that line 14 has no memory increment. If it is an ESMF memory leak, there should be a steady increment.

The problem might be related to uncleaned ESMF objects, or xarray.open_dataset when reading the weight file (e.g. pydata/xarray#2186), or Python's own garbage collection with numpy/scipy objects.

Garbage collection on numpy seems a tricky issue itself, and gc.collect() doesn't necessarily work as naively expected: https://stackoverflow.com/questions/23977904/how-to-implement-garbage-collection-in-numpy https://stackoverflow.com/questions/16261240/releasing-memory-of-huge-numpy-array-in-ipython

If there is still problem after #75 is implemented, then it will be an numpy/scipy/xarray issue that is out of my control.

mohoch1 commented 3 years ago

Hi.

Is there any news regarding this issue? We are experiencing similar problems.

Our application needs to perform regriding many times, and we have tracked that each usage of the Regridder causes a massive increase in memory usage, which is not released.

Has this issue come to any resolution?

Thanks

rokuingh commented 3 years ago

I just became aware of this issue, and thought I would chime in from the ESMPy perspective (ESMPy is the engine behind xESMF). The ESMF 8.1.0 release, expected at the end of March '21, will include a fix for a memory leak in the search algorithm of the regridding code. This may resolve the memory issues discussed in this thread. There should be a new conda package version of ESMPy 8.1.0 by the first of April.

eugene-tam-dpie commented 4 months ago

If people are still having memory issues with xesmf running over a large number of datasets (and likely parallelized, which means the regridder needs to be loaded in each process as it can't be pickled)...

I was able to solve this issue in my case by adding the following after the regridder object was no longer needed within the context...

        regridder.grid_in.destroy()
        regridder.grid_out.destroy()
        del regridder

where regridder is an xesmf.Regridder object.

Not sure if the last del command is required (it certainly didn't work in isolation), but it doesn't hurt to keep it...