GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
769 stars 222 forks source link

Integrate with rioxarray and implement GMT_IMAGE to read RGB GeoTIFFs #1555

Open weiji14 opened 3 years ago

weiji14 commented 3 years ago

Description of the desired feature

This is a parallel issue to #578 and #1442. While researching how to read GeoTIFFs into an xarray.DataArray, specifically GeoTIFFs with RGB bands stored in a single band but with a color lookup table, I discovered that there wasn't really a straightforward solution :sweat_smile: So this issue is mainly to jot down some notes.

1) Reading GeoTiff into xarray.DataArray

Method A: rioxarray.open_rasterio

Pros: Reads in data with proper dtype (e.g. uint8) and proper spatial_ref Cons: rioxarray is a bit of a heavy dependency as it requires scipy (edit: may be resolved with https://github.com/corteva/rioxarray/issues/413, edit2: rioxarray=0.8.0 has made scipy optional!)

import pygmt
import rioxarray

filename_or_obj = pygmt.which(fname="@earth_day_01d", download="a")

with rioxarray.open_rasterio(filename_or_obj) as dataarray:
    print(dataarray)

gives

<xarray.DataArray (band: 1, y: 180, x: 360)>
array([[[251, 251, ..., 251, 251],
        [251, 252, ..., 251, 252],
        ...,
        [ 68,  16, ...,  68, 217],
        [149, 149, ..., 149,  16]]], dtype=uint8)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * y            (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
    spatial_ref  int64 0
Attributes:
    scale_factor:  1.0
    add_offset:    0.0

Method B: xr.open_dataarray(..., engine="rasterio")

Pros: Requires only rasterio (a Pythonic GDAL wrapper) to be installed, which keeps things lightweight Easier to put into pygmt.load_dataarray as it only requires the 'engine' argument Cons: Seems to read in data as float32 by default

Note: There was an xr.open_rasterio function in xarray v0.19.0, but it is pending deprecation, see https://github.com/pydata/xarray/issues/4697, and removal PR at https://github.com/pydata/xarray/pull/5808.

with xr.open_dataarray(filename_or_obj) as dataarray:
    print(dataarray)

produces

<xarray.DataArray 'band_data' (band: 1, y: 180, x: 360)>
array([[[251., 251., ..., 251., 251.],
        [251., 252., ..., 251., 252.],
        ...,
        [ 68.,  16., ...,  68., 217.],
        [149., 149., ..., 149.,  16.]]], dtype=float32)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * y            (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
    spatial_ref  int64 ...

TLDR My preference is to use (A) rioxarray.open_rasterio because it reads in data properly as uint8, with a fallback to (B) xr.open_dataarray(..., engine=rasterio) if only rasterio is installed.

2) Reading RGB colors from a GeoTIFF

Here's where things get a bit tricky. Neither rioxarray.open_rasterio nor xarray.open_dataarray reads in GeoTIFFs with color interpretation, i.e. RGB colors stored in a GeoTIFF will be read as a single band rather than 3 separate bands! This includes the @earth_day_01d and @earth_night_01d imagery. To be clear, non-color GeoTIFFs (e.g. DEMs, single band Landsat/Sentinel-2 images) are fine.

Good news is that the color interpretation mechanism is present in rasterio see https://github.com/mapbox/rasterio/blob/1.2.8/docs/topics/color.rst and https://github.com/mapbox/rasterio/issues/100. This is wrapping around the GDAL Color Table stuff (https://gdal.org/user/raster_data_model.html#color-table)

Bad news is that we'll need to have that mechanism in rioxarray or xarray in order to use it (unless we want to go with a pure rasterio route). There is an issue open though at https://github.com/corteva/rioxarray/issues/191, so it might be a matter of helping to open a Pull Request in those libraries to implement the feature.

3) Implementing GMT_IMAGE in PyGMT

So once we've solved (1) and (2) above, the next step would be, how to pass that 3-band RGB color xarray.DataArray into GMT? Max mentioned at https://github.com/GenericMappingTools/pygmt/issues/578#issuecomment-890355308 that we might need to use the GMT_IMAGE resource, but that hasn't been added into pygmt.clib as of PyGMT v0.4.1.

Probably need something like a virtualfile_from_image function, similar to virtualfile_from_grid (done in #159) at https://github.com/GenericMappingTools/pygmt/blame/v0.4.1/pygmt/clib/session.py#L1278

Are you willing to help implement and maintain this feature? Something to do for PyGMT v0.6.0 I think.

The steps I think would be to:

References:

snowman2 commented 3 years ago

Side note: Both A & B require rioxarray ref.

weiji14 commented 3 years ago

Thanks for chiming in @snowman2! I see you've opened an issue with making scipy optional in rioxarray at https://github.com/corteva/rioxarray/issues/413, that will really help slim things down since SciPy is a ~30MB dependency by itself.

Side note: Both A & B require rioxarray ref.

I stand corrected, it's only the xr.open_rasterio function which requires rasterio, but that looks to be deprecated soon following https://github.com/pydata/xarray/pull/5808 so we won't use that.

maxrjones commented 2 years ago

@joa-quim how do you link the data to the GMT_IS_IMAGE family in GMT.jl? Is this done via GMT_Put_Matrix in the same way as for GMT_IS_GRID?

joa-quim commented 2 years ago

I don't use GMT_Put_Matrix for grids (well maybe sometimes under certain circumstances but don't remember now). What I do for all types is to wrap the GMT type interface for each of them. So for images I do this, which wraps an image type. Images and grids differ for very little but aren't equal either. Again, interfacing with GMT native types: grids, images, datasets, cpts, ps, etc... is how we do the IO business in MEX and Julia, which has the further advantage of not requiring any writing/reading of temporary files.

maxrjones commented 2 years ago

That makes sense, thanks for the explanation and sharing the link to the code.