CDAT / cdat

Community Data Analysis Tools
Other
174 stars 68 forks source link

Selection on curvilinear grids: memory issue and algo #146

Open stefraynaud opened 11 years ago

stefraynaud commented 11 years ago

As already reported in the devel branch, when one try to select a subregion of a variable on a curvilinear grid (2D axes) whose spatial resolution is high, the _bindex module creates a huge matrix of typical size (8_40_10e6/dxy)^2, where dxy is the spatial resolution if the grid, 40*10e6 is the perimeter of earth. The algorithm helps selection on small grid, but the bindex matrix convers the whole earth, whereas it should only overlap the current grid. A small intervention should fix it I guess.

In addition, the current algo does not allow selection using bounds, like (lon1, lon2, 'ccb'). It has been designed to be fast, but I'm not sure that these operation takes so much time. It wrote a function that convert coordinates selection to slices and mask for 1D or 2D axes, or grid (curvilinear or rectangular) : http://relay.actimar.fr/~raynaud/vacumm/library/misc.grid.misc.html#vacumm.misc.grid.misc.coord2slice The code is there : https://forge.ifremer.fr/scm/viewvc.php/lib/python/vacumm/misc/grid/misc.py?view=markup&root=vacumm

Maybe an optimal solution would be to merge the two approaches.

doutriaux1 commented 11 years ago

Thanks Stephane,

Probably too late for 1.3 scheduled next week, but I'll take a look, otherwise I hope it makes in 1.4

C.

doutriaux1 commented 11 years ago

Stephane,

That's what I get for being so slow to take a real look at it, can you point/send me the file?

lib/python/vacumm/misc/grid/misc.py: unknown location HTTP Response Status 404 Not Found Python Traceback Traceback (most recent call last): File "/usr/share/gforge/www/scm/viewvc/lib/viewvc.py", line 3612, in main request.run_viewvc() File "/usr/share/gforge/www/scm/viewvc/lib/viewvc.py", line 314, in run_viewvc % self.where, '404 Not Found') ViewVCException: 404 Not Found: lib/python/vacumm/misc/grid/misc.py: unknown location

Powered By FusionForge

doutriaux1 commented 11 years ago

ok just found the file. will take a look

doutriaux1 commented 9 years ago

@stefraynaud the following seems to work, holding on for 2.2 until I can understandexactly what you mean:

import cdms2
import os
import sys

f=cdms2.open(os.path.join(sys.prefix,"sample_data","sampleCurveGrid4.nc"))

s=f("sample",latitude=(-20,20,"ccb"),longitude=(20,40,"ccb"))

print s.shape
stefraynaud commented 9 years ago

@doutriaux1, in your example, the grid has a resolution close of several degrees. The problem arises when the grid has a higher resolution (<<1 degree).

Try this example:

import cdms2
import os
import sys

f=cdms2.open(os.path.join(sys.prefix,"sample_data","swan.four.nc"))
f('HS',lon=(-5.1,-4),lat=(48.42,49))
f.close()
doutriaux1 commented 9 years ago

@stefraynaud ok so that works for me, i get the following, are you saying it select too mch (lots of missing) ? steph

stefraynaud commented 9 years ago

so you simply have more RAM than me:

>>> f('HS',lon=(-5.1,-4),lat=(48.42,49))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/site/data/yoch/raynaud/soft/uvcdat-latest/lib/python2.7/site-packages/cdms2/cudsinterface.py", line 45, in __call__
    return v(*args, **kwargs)
  File "/usr/site/data/yoch/raynaud/soft/uvcdat-latest/lib/python2.7/site-packages/cdms2/avariable.py", line 155, in __call__
    grid=grid)
  File "/usr/site/data/yoch/raynaud/soft/uvcdat-latest/lib/python2.7/site-packages/cdms2/selectors.py", line 151, in unmodified_select
    mask, indexspecs = vargrid.intersect(specs)
  File "/usr/site/data/yoch/raynaud/soft/uvcdat-latest/lib/python2.7/site-packages/cdms2/hgrid.py", line 612, in intersect
    index = self.getIndex()
  File "/usr/site/data/yoch/raynaud/soft/uvcdat-latest/lib/python2.7/site-packages/cdms2/hgrid.py", line 596, in getIndex
    self._index_ = bindex.bindexHorizontalGrid(latlin, lonlin)
  File "/usr/site/data/yoch/raynaud/soft/uvcdat-latest/lib/python2.7/site-packages/cdms2/bindex.py", line 17, in bindexHorizontalGrid
    head = numpy.zeros(NI*NJ,dtype='l')       # This should match NBINI, NBINJ in bindex.c
MemoryError
>>> 

Check the memory you are using, and compare with your example.

durack1 commented 9 years ago

@doutriaux1 @stefraynaud there was a pretty nasty bug in numpy <1.9.0 related to sorting which I found used an incredible amount of memory and CPU cycles on fairly large arrays (https://github.com/numpy/numpy/issues/4814) it would seem to me that the testing above is on a recent (2.0.0?) version which now includes numpy 1.9.0.. So you might find things have changed a little if the numpy issue was the problem..

stefraynaud commented 9 years ago

@durack1 I think this bug is not related to any numpy issue, but to the (NI,NJ) you can see in the error message above. In fact, I'm indirectly responsible for this problem because it has been introduced by @doutriaux1 when I told him that geographical selections on curvilinear grid were not working due to the low resolution binning : @doutriaux1 increased the resolution to match the grid resolution, but the binning is still applied to the whole earth domain, and not a smaller domain just embedding the grid domain.

In addition to the memory issue, the selection operation is not as exact as in the rectangular case. This is why I implemented my own solution, which is sufficiently fast so it doesn't need a system with a cache like with bindex in cdms2.

doutriaux1 commented 9 years ago

@stefraynaud ok I get the issue now. Will try to track your file again and apply it. Thanks.

doutriaux1 commented 9 years ago

sadly moving to 2.3....

dnadeau4 commented 7 years ago

I tried @stefraynaud example and it worked with numpy 1.11 PR #2125 . Is this still an issue?

stefraynaud commented 7 years ago

The problem highlighted in this ticket is a memory issue due to the algo to extract slices on a curvilinear grid. It has nothing to see with numpy.