GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
747 stars 216 forks source link

grdimage does not correctly plot xarray DataArrays when they don't contain the redundant longitude at 360 E #375

Closed MarkWieczorek closed 4 years ago

MarkWieczorek commented 4 years ago

grdimage of pygmt does not correctly plot xarray DataArrays when they do not contain the redundant column of data at 360 E.

Using pygmt directly with a netcdf file provides the correct output, such as with a command like:

fig.grdimage('test.grd', region='g', projection='A0/0/6i')

test

However, here is the same output generated by pygmt when using an xarray in memory:

fig.grdimage(grid, region='g', projection='A0/0/6i')

longitude

MarkWieczorek commented 4 years ago

And here is a simple script that demonstrates this problem:

# create a grid that spans 0 359 E, and -89 to 90 N.
longitude = np.arange(0, 360, 1)
latitude = np.arange(-89, 91, 1)
x = np.sin(np.deg2rad(longitude))
y = np.linspace(start=0, stop=1, num=180)
data = y[:, np.newaxis] * x

# create a DataArray, and then export this as a netcdf file
dataarray = xr.DataArray(data, coords=[('latitude', latitude,
                                       {'units': 'degrees_north'}),
                                       ('longitude', longitude,
                                       {'units': 'degrees_east'})], 
                         attrs = {'actual_range': [-1, 1]})
dataset = dataarray.to_dataset(name='dataarray')
dataset.to_netcdf('test.grd')

fig = pygmt.Figure()

# create a projected image using the DataArray in memory and the netcdf file
fig.grdimage(dataset.dataarray, region='g', projection='A0/0/6i')
fig.grdimage('test.grd', region='g', projection='A0/0/6i', X='6.2i')

test

leouieda commented 4 years ago

This might be related to pixel registered vs grid node registered grids. GMT makes a lot of internal assumptions when reading the netCDF file that we don't when converting the xarray.DataArray.

MarkWieczorek commented 4 years ago

If I use the -V option with the file test.grd, gmt warns me that it assumes it is gridline registered. If I define the longitude attribute actual_range = [0, 359], then I don't get this warning (in this case, the gmt code confirms that it is gridline and doesn't need to make any guesses). In either case, defining actual_range or not produces the same output.

As for pygmt, grdimage creates a virtual file using

file_context = lib.virtualfile_from_grid(grid)

And here is the code for this function

        matrix, region, inc = dataarray_to_matrix(grid)
        family = "GMT_IS_GRID|GMT_VIA_MATRIX"
        geometry = "GMT_IS_SURFACE"
        gmt_grid = self.create_data(
            family, geometry, mode="GMT_CONTAINER_ONLY", ranges=region, inc=inc
        )
        self.put_matrix(gmt_grid, matrix)
        args = (family, geometry, "GMT_IN|GMT_IS_REFERENCE", gmt_grid)
        with self.open_virtual_file(*args) as vfile:
            yield vfile

If you look at create_data(), you will see that there is an optional keyword registration. However, this keyword is NOT used in the routine. There is a variable registration_int, but this appears to be hardcoded to be GMT_GRID_NODE_REG

So, to the best of my knowledge, pygmt is assuming that all xarrays are gridline registered. If this is the case, this is a bug.

There are probably one of three ways to resolve this problem:

Edit: by the way, I think that pygmt routines should have a keyword verbose=True which corresponds to -V

MarkWieczorek commented 4 years ago

Could you let me know how gmt reads the pygmt virtual files? I looked through the gmt code a little, and I am going to guess that the file format is id=bf: GMT native, C-binary format (32-bit float). If I had to guess, this file format probably gets parsed by the function gmtio_bin_input that is in the file gmt_io.c. After confirmation, I will open an issue at gmt to see if this is something that they should fix.

Also, it appears that this doesn't affect all map projections. Here the same image as above, but in a mollweide projection: moll

weiji14 commented 4 years ago

Ah, the ol' gridline vs pixel based registration problem. I think you hit the nail on the head there @MarkWieczorek. I'm pretty sure xarray uses centre-based coordinates (i.e. pixel-based registration) and if pygmt defaults to gridline registration, then this is definitely a serious issue. I've hit this time and time again before (e.g. at https://github.com/weiji14/deepbedmap/pull/150) and it just keeps coming back to haunt me...

Not sure if setting the default registration to pixel-based would solve the issue (since we're using xarray, it might be necessary?), but I'll try and look into it tomorrow if I have some mental bandwidth to do so (running a "FOSS4G Community Day" for PyGMT) Edit: Putting the Windows installation issue on a higher priority. If you've found a fix though, feel free to submit a Pull Request and I can review it.

MarkWieczorek commented 4 years ago

I don't think that this is a pixel related issue. pygmt uses the routine dataarray_to_matrix to extract the data maxtrix from an xarray DataArray and then returns this unmodified as a numpy array. As far as I can tell, pygmt assumes that all DataArrays are gridline registered, and the example I provide above is for gridline registered data.

As you say, xarray used center-based coordinates, but this corresponds to gridline registration in gmt. (The coordinates are at the grid line intersections that occur at the center of the pixel, not the edges).

It is possible that gmt is misinterpreting the virtual file as pixel registered, and if this is the case, this is a bug with gmt. It is conceivable that gmt uses the ration of nlat and nlon to guess the registration, and since my grid doesn't include the redundant longitude at 360E, I could perhaps see this being a problem if gmt has to guess the registration.

weiji14 commented 4 years ago

Hi @MarkWieczorek, sorry for not following up on this soon enough. I think we've found the issue, and it's probably because the virtualfile mechanism assumes all xarray grids as Cartesian instead of Geographic (see https://github.com/GenericMappingTools/pygmt/pull/476#issuecomment-647755091).

We're working on allowing users to tell PyGMT whether their grid is Cartesian/Geographic, and Pixel/Gridline registered. This might take some time to implement as there's a lot of issues to iron out, and the team will be busy with GMT 6.1.0. But you can provide some feedback on #487 which decides what the recommended default will be going forward.