Open JiaweiZhuang opened 5 years ago
Also, what's the scientific use case for this @sdeastham? I thought that global ~1 km data are so rare right now, so didn't pay much attention to such a case (https://github.com/JiaweiZhuang/xESMF/pull/26#issuecomment-406458273). I know @bekozi is pretty interested in regridding ultra-high resolution data. Maybe a chance to collaborate.
Thanks for raising this! The use case here is to take advantage of very fine global population density data which has become available, specifically when calculating regional and global exposure to atmospheric pollutants. The latest Gridded Population of the World dataset (GPWv4) is provided at 30 arc-second resolution, so the grid is order 1e5 cells in each dimension. I have workarounds - coarser-resolution versions of the data are available, and when considering high resolution nested domains I can work with smaller subgrids to avoid working directly with one giant grid - but I was hoping to be able to regrid the finest possible grid directly in xesmf to minimize regridding error.
Sent via the Samsung Galaxy S8, an AT&T 4G LTE smartphone
-------- Original message -------- From: Jiawei Zhuang notifications@github.com Date: 8/25/18 23:34 (GMT-05:00) To: JiaweiZhuang/xESMF xESMF@noreply.github.com Cc: Sebastian Eastham seastham@mit.edu, Mention mention@noreply.github.com Subject: Re: [JiaweiZhuang/xESMF] Cannot allocate very large (global ~1km) ESMF grid (#29)
Also, what's the scientific use case for this @sdeasthamhttps://github.com/sdeastham? I thought that global ~1 km data are so rare right now, so didn't pay much attention to such a case (#26 (comment)https://github.com/JiaweiZhuang/xESMF/pull/26#issuecomment-406458273). I know @bekozihttps://github.com/bekozi is pretty interested in regridding ultra-high resolution data. Maybe a chance to collaborate.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/JiaweiZhuang/xESMF/issues/29#issuecomment-416011365, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGT-bN_my4ieyDjJMzlrNR0YgyuEFLZAks5uUhcugaJpZM4WMoAL.
@sdeastham This grid is a good test case for ultra-high resolution conservative regridding. Right now, the workflow for regridding data at this resolution works with netCDF (I see the highest resolution grid is not in netCDF format - assuming this is the dataset you are interested). I am migrating the workflow to use xarray-dask/xESMF so GeoTiff will be supported. It's good to hear you have workarounds as it will take some time to get this working. If your timeline is more pressing for the high resolution, let me know...
Also thanks to @JiaweiZhuang for getting this on the tracker!
Note: It often makes sense to store high resolution sparse data like these in an unstructured format if netCDF is required.
That is indeed the dataset in question – thanks! Looking forward to seeing how this progresses, and please let me know if there’s anything useful I can do on this front.
From: Ben Koziol notifications@github.com Sent: Monday, August 27, 2018 12:31 PM To: JiaweiZhuang/xESMF xESMF@noreply.github.com Cc: Sebastian Eastham seastham@mit.edu; Mention mention@noreply.github.com Subject: Re: [JiaweiZhuang/xESMF] Cannot allocate very large (global ~1km) ESMF grid (#29)
@sdeasthamhttps://github.com/sdeastham This grid is a good test case for ultra-high resolution conservative regridding. Right now, the workflow for regridding data at this resolution works with netCDF (I see the highest resolution grid is not in netCDF format - assuming thishttp://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev10 is the dataset you are interested). I am migrating the workflow to use xarray-dask/xESMF so GeoTiff will be supported. It's good to hear you have workarounds as it will take some time to get this working. If your timeline is more pressing for the high resolution, let me know...
Also thanks to @JiaweiZhuanghttps://github.com/JiaweiZhuang for getting this on the tracker!
Note: It often makes sense to store high resolution sparse data like these in an unstructured format if netCDF is required.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/JiaweiZhuang/xESMF/issues/29#issuecomment-416285288, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGT-bP93ZSOo-jZ2rblculapSzIvFU1pks5uVB7NgaJpZM4WMoAL.
Also hitting this limit working with some Nasa Ice Bridge data. +1 for making this a priority. (Wish I new more about the ESMF backend to help...)
@NicWayand thanks for the data link. I'm approaching something workable as example/dev code that can handle xarray
datasets from netCDF or rasterio
. Would you be interested in helping with the xESMF
code in an example notebook? Basically, it would be the piece of code that takes source and destination xarray
spatial chunks and regrids them. Hopefully it will be pretty straightforward. EDIT: It may be a little complex, as I'm still unsure how best to use dask.
Hi @bekoi, I am interested in helping but not clear exactly what your asking for? Just an example using xesmf where this issue is hit?
@NicWayand thanks and sorry for not being more clear. I've realized I won't have time to learn and use xESMF
properly and was fishing for some expertise. I had some example code as part of https://github.com/JiaweiZhuang/xESMF/pull/26, but that did not use xarray
as the IO backend. The next version will use xarray
exclusively for IO (allows GeoTiff, etc.).
I'm looking for help with the bit of code that regrids source/destination spatial chunks within a dask
task. This is the location in the old code where the regridding would go. Again, this example will be evolving, but it is a similar approach at heart. However, I am not sure this is the right approach for using dask
so am looking for ideas. With big gridded data, we want to avoid loading everything on the root task. Lazy loading may be the ticket, but I still think it would be best to isolate processing within tasks as opposed to having a bunch of computation on the root with communication.
Anyway, no pressure of course. I'll post another PR here, and we can go from there!
Is there any progress on this issue about the "TypeError: buffer is too small for requested array"? I want to do a regridding work from a global ~1km resolution to a regional domain(maybe 9 or 3 or 1km resolution, the domain is only a small part compared to the global),but it failed.
I want to do a regridding work from a global ~1km resolution to a regional domain(maybe 9 or 3 or 1km resolution, the domain is only a small part compared to the global),but it failed.
Try cropping the global data (with ds.sel()
/ds.isel()
) before building the regridder? This will avoid creating large ESMF grid objects.
As for TypeError: buffer is too small for requested array
, I am still not sure what's happening.
From the source code addon/ESMPy/src/ESMF/util/esmpyarray.py
, the function ndarray_from_esmf()
is trying to create numpy array from ESMF Fortran array:
import ESMF.api.constants as constants
import numpy as np
import numpy.ma as ma
import ctypes as ct
import sys
def ndarray_from_esmf(data, dtype, shape):
'''
:param data: buffer of fortran allocated ESMF array
:type data: ctypes void_p
:param dtype: the type of the esmf buffer
:type dtype: ESMF.TypeKind
:param shape: N-D Python shape corresponding to 1D ESMF allocation
:type shape: list or tuple
:return: numpy array representing the data with dtype and shape
'''
# find the size of the local coordinates
size = np.prod(shape[:]) * \
np.dtype(constants._ESMF2PythonType[dtype]).itemsize
# create a numpy array to point to the ESMF data allocation
if sys.version_info[0] >= 3:
buffer = ct.pythonapi.PyMemoryView_FromMemory
buffer.restype = ct.py_object
buffer = buffer(data, ct.c_int(size), 0x200)
else:
buffer = np.core.multiarray.int_asbuffer(
ct.addressof(data.contents), size)
esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
buffer, order="F")
return esmfarray
The error happens in the call:
esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
buffer, order="F")
The minimal code to get the same error is:
np.ndarray(shape=3, buffer=np.array([0.0, 1.0]), dtype=np.float64)
# buffer size is smaller than the array size, leads to "buffer is too small for requested array"
It is probably because the earlier call buffer = buffer(data, ct.c_int(size), 0x200)
doesn't allocate enough memory. buffer
is a PyMemoryView_FromMemory()
function, which takes three arguments:
data
is the memory location allocated upstream and passed to ndarray_from_esmf()
. It was created by a data = ESMP_GridGetCoordPtr(...)
call from addon/ESMPy/src/ESMF/api/grid.py
. Then it goes into deep Fortran code which I haven't had time digging into.ct.c_int(size)
is the requested memory in bytes. For the large grid shape (20000, 15000)
used here, size == 20000*15000*8 == 2400000000
0x200
(512) is simply the constant PyBUF_WRITE
.Maybe @bekozi can elaborate more on this. Probably some internal memory constraints in ESMF?
By cropping it to a regional domain, it finally works.
By the way, it seems that for 'conservative' method, the 'lat_b','lon_b' are needed. But is there methods to create these two variables from the original gird.
For example, the grid information of input data after cropping is
Grid coordinates : 1 : lonlat : points=3240000 (1800x1800) lon : -94.9958 to -80.0042 by 0.00833333 degrees_east lat : 30.0042 to 44.9958 by 0.00833333 degrees_north
The resolution is about 1km, the unit of given interested variable is ton/cell.
(Here does cell mean about 1km*1km???)
While for the out grid, it is
# gridID 1 gridtype = curvilinear gridsize = 9801 xname = XLONG xlongname = longitude xunits = degree_east yname = XLAT ylongname = latitude yunits = degree_north xsize = 99 ysize = 99 xvalues = ... yvalues = ...
the resolution may be 9 km.
Thanks!
By the way, it seems that for 'conservative' method, the 'lat_b','lon_b' are needed. But is there methods to create these two variables from the original gird.
Right, currently the grid cell bounds (lon_b
& lat_b
) are separated/decoupled from lon
& lat
. Slicing doesn't automatically happen for "size N+1" coordinate variables. See https://github.com/JiaweiZhuang/xESMF/issues/13#issuecomment-359043859.
For what it's worth, this is the use case for which I installed xesmf: producing 1km global grids. These are slow with other methods.
Same issue when trying to downscale 1000x1000 1km data to 30000x30000 30m. Any updates?
Just ran into this today when trying to regrid high resolution land use and vegetation density data.
Same issue with a Field of dim 1193x843x8736 grid (X Y T) with a total size of 70286291712 Look like the constraint is on the Fortran side.
I am also running into this issue when attempting to resample 0.25 deg data to 30 second grids (for use with a global ML model). With the wide use now of ML for producing global maps, I think that the need for handling high-resolution global grids is no longer a fringe case.
I think this repo is unfortunately dead and/or dormant. The sparselt (https://github.com/LiamBindle/sparselt) package by @LiamBindle may be a viable alternative for regridding.
@sdeastham failed to use xESMF to regrid global ~1 km resolution data. The issue is that ESMPy cannot create very large grid object. A minimal example is:
The last command leads to
TypeError: buffer is too small for requested array
: