CDAT / cdms

8 stars 10 forks source link

Checking regridding memory usage? #113

Open jypeter opened 7 years ago

jypeter commented 7 years ago

I have pasted below part of an old mail (December 2012) I sent to @doutriaux1 , Dave Kinding and Alexander Pletzler about memory issues I got during regridding.

It would be nice if you could use the provided script, or something similar to check the memory usage behavior, both on a local desktop/laptop, and a multi-user server. With some luck, maybe there is no memory usage problem any more!

I have put my simple test script and data file link at the end of this issue

The following extra questions come after a study I made when one of our users told me that an existing script was not working any more. The script was doing some simple regridding for preparing boundary conditions files before running a model (a very typical use of cdat for us) and seemed to be stuck: crtl-C was not working and you had to kill the process (you could suspend it with ctrl-Z). I started his script again, did a 'top' and saw that his python was using 6+ Gb, which seemed pretty weird. I eventually came up with the attached script and data file to study the problem

The user used the default and easy regridding of cdms2
    regridded_var = original_var.regrid(target_grid)

Now, when I do that to go from a regular grid to a regular grid (nothing fancy), I get the following warnings
    We chose regridTool = esmf for you
and    We chose regridMethod = linear

3) The esmf/linear combination uses A LOT of memory and is QUITE SLOW. Isn't it an overkill to have this by default to regrid regular->regular, especially if it leads the unsuspecting user to believe that CDAT is broken?

4) I got the regridding we wanted (I think) by using regrid2
    regrid_func = regrid2.Horizontal(original_grid, target_grid)
    regridded_var = regrid_func(original_var)

Could this behavior be the default (rather than esmf/linear)?

Would it be possible to have a better docstring for regrid2? Because help(regrid2) was not very helpful and I had to experiment a bit before finding I needed "regrid2.Horizontal". I have just checked that it is explained in the 2007 version of cdms5.pdf, but I'm not sure many people have the cdms bible around :)

5) Would it be possible to keep the memory usage of esmf/linear under a tighter control and clean up things after use, or is there a memory leak? The memory usage is really ALARMING!

I used the attached script and the 'top' command to get the following figures (I used the ram value in the VIRT column), when regridding a variable from a 128x128 grid to a 1 degree grid and then to finer grids. For all target grids, regrid2.Horizontal was fast and clean. esmf/linear was slow, memory HUNGRY and dirty (python process was using TOO MUCH memory after usage). I had to kill the last regrid (to the 0.125 degree grid) because it was using way too much memory

# Grid  Base    regrid2 DIFF    esmf (peak)     DIFF    FINAL
#
# 1     424m    424m    0       674m             250m    600m
# 0.5   424m    424m    0       1212m            788m    988m
# 0.25  424m    431m    7m      3886m           3455m   2548m
#                               4055m           1507m   2828m
#                               3920m           1092m   2982m
#                               3924m            942m   2987m
#                               3908m            921m   3055m
# 0.125 424m    439m    15m     >14.5g   <-- killed the process before completion

* Base = python process ram usage before the regrid2 regridding

* regrid2 and DIFF = python process ram usage after regrid2 regridding and difference. We can see that for the .125 degree grid, the extra ram (15m) only comes from the newly created matrix
    1440*2880 points * 4 bytes/1024./1024 = 15.82 Mb
This is surprisingly (amazingly!) efficient. It's a bit different for the 0.25 grid, where we could expect a memory increase of only 4 Mb
    720*1440*4/1024./1024 = 3.95 Mb

* esmf (peak) and diff = peak ram usage during the esmf/linear regridding and ram increase during the regridding. This is A LOT (for regular->regular)...

* FINAL = final ram usage after the regridding is finished. It is slightly lower than the peak usage during the regridding, but much higher than the size of the regridded variable that was created. Some temporary variables should be destroyed before returning the final result...

In the 0.25 degree grid case, I stayed in the script and executed several times the v_source.regrid(target_grid) line. The peak memory usage 'seemed stable', but the final memory usage kept on increasing (but not regularly)

Test script

The data file used in this script can be downloaded from https://files.lsce.ipsl.fr/public.php?service=files&t=832fec4913b47804b0c9f77adddd3f9d

#!/usr/bin/env python

# CDAT regridding test

# Memory usage results
# Grid  Base    regrid2 DIFF    esmf (peak)     DIFF    FINAL
# 
# 1     424m    424m    0       674m             250m    600m
# 0.5   424m    424m    0       1212m            788m    988m
# 0.25  424m    431m    7m      3886m           3455m   2548m
#                               4055m           1507m   2828m
#                               3920m           1092m   2982m
#                               3924m            942m   2987m
#                               3908m            921m   3055m
# 0.125 424m    439m    15m     >14.5g <-- killed the process before completion
import cdms2, genutil
import regrid2

in_file = 'RugosFOAM.nc'
in_var = 'RUGOS'

f = cdms2.open(in_file)
v_source = f(in_var)
f.close()

print 'Original variable range =', genutil.minmax(v_source)

# Regrid from a 1.0 to a 0.125 degree grid 
target_grid = cdms2.createUniformGrid(-89.5, 180, 1, -179.5, 360, 1)
#target_grid = cdms2.createUniformGrid(-89.75, 360, 0.5, -179.75, 720, 0.5)
#target_grid = cdms2.createUniformGrid(-89.875, 720, 0.25, -179.875, 1440, 0.25)
#target_grid = cdms2.createUniformGrid(-89.9375, 1440, 0.125, -179.9375, 2880, 0.125)

#print target_grid.getLatitude()
#print target_grid.getLongitude()

# NEW style regridding
foam_grid = v_source.getGrid()
foam_to_newgrid = regrid2.Horizontal(foam_grid, target_grid)
raw_input("Press RETURN to start 'regrid2 style' regridding")
v_regrid_news = foam_to_newgrid(v_source)

print 'Regridded variable range =', genutil.minmax(v_regrid_news)

# OLD style regridding that defaults to 'esmf'+'linear' regridding
raw_input("Press RETURN to start 'old style' regridding")
v_regrid_olds = v_source.regrid(target_grid)

print 'Regridded variable range =', genutil.minmax(v_regrid_news)

regrid_diff = v_regrid_olds - v_regrid_news
regrid_change = genutil.minmax(regrid_diff)

print 'Regridded vars difference', regrid_change

# The end
doutriaux1 commented 7 years ago

thanks for this @jypeter