xyz2grd crashes python with large grids

MarkWieczorek commented 8 months ago

Description of the problem

I am trying to create a grd file using pygmt's xyz2grd with a numpy array. The numpy array is large (64 pixels per degree corresponding to 11520 x 23040). Running the xyz2grd command kills the python kernel and gives the following result

python3.12(11590,0x1e10f1c40) malloc: *** error for object 0x3ff142fe20000000: pointer being freed was not allocated
python3.12(11590,0x1e10f1c40) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    11590 abort      ipython

I have the same problem on macOS and on Fedora linux (though on fedora, I don't think a error is reported, it just crashes).

Minimal Complete Verifiable Example

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.empty((nlat, nlon))
spacing = 1/64

grd = pygmt.xyz2grd(array, spacing=spacing, region=[0, 360, -90, 90], registration='p')

Full error message

python3.12(11590,0x1e10f1c40) malloc: *** error for object 0x3ff142fe20000000: pointer being freed was not allocated
python3.12(11590,0x1e10f1c40) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    11590 abort      ipython

System information

PyGMT information:
  version: v0.11.0
System information:
  python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]
  executable: /Users/lunokhod/miniconda3/envs/py312/bin/python
  machine: macOS-14.3-arm64-arm-64bit
Dependency information:
  numpy: 1.26.3
  pandas: 2.1.4
  xarray: 2023.12.0
  netCDF4: 1.6.5
  packaging: 23.2
  contextily: None
  geopandas: None
  ipython: None
  rioxarray: None
  ghostscript: 10.02.1
GMT library information:
  binary version: 6.5.0
  cores: 10
  grid layout: rows
  image layout:
  library path: /Users/lunokhod/miniconda3/envs/py312/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/lunokhod/miniconda3/envs/py312/lib/gmt/plugins
  share dir: /Users/lunokhod/miniconda3/envs/py312/share/gmt
  version: 6.5.0

seisman commented 8 months ago

I can confirm the issue. It's likely an upstream GMT bug in memory management.

Here is a working version:

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.empty((nlat, nlon))
spacing = 1/64

x = np.arange(0, 360, spacing)
y = np.arange(-90, 90, spacing)
xx, yy = np.meshgrid(x, y)

grid = pygmt.xyz2grd(x=xx.flatten(), y=yy.flatten(), z=array.flatten(), spacing=spacing, region=[0, 360, -90, 90], registration='p')
print(grid)

MarkWieczorek commented 8 months ago

For info, I was able to read the original binary .img fine with GMT. I'm thus not sure if it's possible to reproduce this problem with GMT given that the input I used with the pygmt version was a numpy array.

MarkWieczorek commented 8 months ago

Actually, the above example doesn't work. If you replace np.empty() with np.ones(), you will see that there are lots of nans in the grid that is created:

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.ones((nlat, nlon))
spacing = 1/64

x = np.arange(0, 360, spacing)
y = np.arange(-90, 90, spacing)
xx, yy = np.meshgrid(x, y)

grid = pygmt.xyz2grd(x=xx.flatten(), y=yy.flatten(), z=array.flatten(), spacing=spacing, region=[0, 360, -90, 90], registration='p')
print(grid)

which gives as output

<xarray.DataArray 'z' (y: 11520, x: 23040)>
array([[ 1., nan,  1., ..., nan,  1., nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [ 1., nan,  1., ..., nan,  1., nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [ 1., nan,  1., ..., nan,  1., nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
  * x        (x) float64 0.007812 0.02344 0.03906 0.05469 ... 360.0 360.0 360.0
  * y        (y) float64 -89.99 -89.98 -89.96 -89.95 ... 89.95 89.96 89.98 89.99
Attributes:
    long_name:     z
    actual_range:  [1. 1.]

The same thing occurs with np.zeros() and real data. In fact, the nans also occur with np.empty()

seisman commented 8 months ago

I think it's most likely because the grid should be gridline-registration instead of pixel-registration.

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.ones((nlat, nlon))
spacing = 1/64

x = np.arange(0, 360, spacing)
y = np.arange(-90, 90, spacing)
xx, yy = np.meshgrid(x, y)

grid = pygmt.xyz2grd(x=xx.flatten(), y=yy.flatten(), z=array.flatten(), spacing=spacing, region=[0, 360, -90, 90], registration='g', coltypes="g")
print(grid)

the output is:

<xarray.DataArray 'z' (lat: 11521, lon: 23041)>
array([[ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       ...,
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
  * lon      (lon) float64 0.0 0.01562 0.03125 0.04688 ... 360.0 360.0 360.0
  * lat      (lat) float64 -90.0 -89.98 -89.97 -89.95 ... 89.95 89.97 89.98 90.0
Attributes:
    long_name:     z
    actual_range:  [1. 1.]

the last row is still NaN, because lat=90 or -90 is missing data.

MarkWieczorek commented 8 months ago

The grid is supposed to be pixel registered. What I am trying to do is read a large pixel registered image into pygmt, and then convert it to gridline using grdconvert. This works fine using the GMT command line tools, but doesn't work using pygmt.

yvonnefroehlich commented 8 months ago

I can confirm the issue. It's likely an upstream GMT bug in memory management.

Hm. I am not sure, but sofar I understood xyz2grd in the way that your input data have to be tabular data, i.e., three columns longitude, latitude, and a z-value. And they should be converted to a GMT-ready grid. But an array in the form array = np.empty((nlat, nlon)) passed to data is not in that format. So you have to use .flatten, as done in the later examples.

seisman commented 8 months ago

I can confirm the issue. It's likely an upstream GMT bug in memory management.

Hm. I am not sure, but sofar I understood xyz2grd in the way that your input data have to be tabular data, i.e., three columns longitude, latitude, and a z-value. And they should be converted to a GMT-ready grid. But an array in the form array = np.empty((nlat, nlon)) passed to data is not in that format. So you have to use .flatten, as done in the later examples.

I think you're right. xyz2grd should not accept a 2-d matrix like the one in the original post.

seisman commented 8 months ago

Also see a related feature request at https://github.com/GenericMappingTools/pygmt/issues/2852.

MarkWieczorek commented 8 months ago

The documentation states that you can use 2d arrays. If this is incorrect, could we update the documentation?

seisman commented 8 months ago

it's a little misleading. The 2-d array should has three columns for x, y and z.

weiji14 commented 7 months ago

Maybe we can add a check, so that an error or warning is raised when a 2D array with >3 columns is passed in to the data parameter? Code will need to be inserted somewhere here:

https://github.com/GenericMappingTools/pygmt/blob/dd8e0cd87a7a0629aafb752ba8934cec4149f011/pygmt/src/xyz2grd.py#L149-L157

GenericMappingTools / pygmt