GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
731 stars 213 forks source link

Wrapper for grdmix #578

Open weiji14 opened 3 years ago

weiji14 commented 3 years ago

Description of the desired feature

Implement grdmix which does "Blending and transforming grids and images".

Inspired by https://github.com/GenericMappingTools/gmt/issues/3992#issuecomment-686312431 which would help resolve the longstanding PR at https://github.com/GenericMappingTools/pygmt/pull/370#issuecomment-558314644

Implementation wise, should we output the rgb raster into an xarray.Dataset? Or a list of xarray.DataArrays? See also related discussion at #370.

Are you willing to help implement and maintain this feature? Yes, but help is welcome

maxrjones commented 2 years ago

This would be a great feature. I have a few comments:

  1. For the output, I would prefer the default to be a (M,N,3) xarray.DataArray rather than a DataSet or list of DataArrays. Then for the fastest plots people could still use xarray's wrapping of matplotlib.pyplot.imshow().
  2. Grdmix can do several things - I use the grdmix -C -N option to produce an image that can be used for grdimage the most often, but there's also blending of grids and adding of transparencies. My question here is whether it's best to wrap all of grdmix's capabilities as one function or break it up a bit? The feature that I would most like to see in PyGMT is a function that takes three xarray.DataArray's, applies some clipping of the tails, normalizes to 0-1, and creates an image that can be passed to pygmt.grdimage (kinda similar to GMT.jl's truecolor function). This currently takes quite a few steps in GMT (e.g., a forum post), but I think it could be relatively simple to support as one step in PyGMT and expect this is a common task for anyone dealing with multi-band data (e.g., the preprocessing for example data in https://github.com/GenericMappingTools/pygmt/pull/370).
  3. The feature request in point 2 still requires passing an image into grdimage (related to https://github.com/GenericMappingTools/pygmt/pull/370, which passes three grids to represent an image). I expect this would require using the GMT_IMAGE resource, but this isn't currently included in the FAMILIES in pygmt.clib.sesssion (https://github.com/GenericMappingTools/pygmt/blob/master/pygmt/clib/session.py#L30-L36). Could this be used to pass (M,N,3) xarray.DataArray's to modules that accept images (e.g., grdimage and psimage)?
weiji14 commented 2 years ago

Those are some great comments! I've been putting off this feature as I wanted to see how the rest of the PyData ecosystem is storing multi-band images, notably with packages like rioxarray and xarray-spatial. Best to have PyGMT's grdmix work well with those libraries.

  1. For the output, I would prefer the default to be a (M,N,3) xarray.DataArray rather than a DataSet or list of DataArrays. Then for the fastest plots people could still use xarray's wrapping of matplotlib.pyplot.imshow().

:+1:. Yes I think producting an (M,N,3) xarray.DataArray output sounds like the way to go. We should also decide on whether to use rioxarray.open_rasterio (see https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#reading-files).

  1. Grdmix can do several things - I use the grdmix -C -N option to produce an image that can be used for grdimage the most often, but there's also blending of grids and adding of transparencies. My question here is whether it's best to wrap all of grdmix's capabilities as one function or break it up a bit? The feature that I would most like to see in PyGMT is a function that takes three xarray.DataArray's, applies some clipping of the tails, normalizes to 0-1, and creates an image that can be passed to pygmt.grdimage (kinda similar to GMT.jl's truecolor function). This currently takes quite a few steps in GMT (e.g., a forum post), but I think it could be relatively simple to support as one step in PyGMT and expect this is a common task for anyone dealing with multi-band data (e.g., the preprocessing for example data in Allow passing in Red, Green, Blue grids into grdimage #370).

In the short term (PyGMT v0.5.0), I would suggest doing a minimal thin wrapper around GMT grdmix. There are some inherent limitations with grdmix I think, in that GMT allows only 0-255 (8-bit) colors which is suitable for plotting, but not for actual analysis work on satellite images like Sentinel 2 which has a 12-bit radiometric resolution.

For those analysis tasks, which will be a more long term goal (PyGMT v0.6.0?), we should either:

  1. defer users to using rasterio/rioxarray/xarray-spatial (which are pretty much all GDAL wrappers with a Pythonic interface) and have PyGMT grdmix be used solely for the final visualization plot; and/or
  2. wrap more modules with satellite preprocessing capabilities (e.g. grdhisteq as in #1399) and write a full tutorial on how PyGMT can be used to perform the entire raw data -> processing -> visualization pipeline.

The option depends on what radiometric resolution people want to work on. Those ok with 8-bit radiometric resolution can use Option 2 (though I don't see many optical satellite products that have such a low bit resolution anymore...) while those wanting higher (e.g. 16-bit) radiometric resolution should go with Option 1. Another thing we may want to consider is whether the GMT C library should support 16-bit (0-65536) precision for grdmix. Something worth raising upstream perhaps :slightly_smiling_face:

  1. The feature request in point 2 still requires passing an image into grdimage (related to Allow passing in Red, Green, Blue grids into grdimage #370, which passes three grids to represent an image). I expect this would require using the GMT_IMAGE resource, but this isn't currently included in the FAMILIES in pygmt.clib.sesssion (https://github.com/GenericMappingTools/pygmt/blob/master/pygmt/clib/session.py#L30-L36). Could this be used to pass (M,N,3) xarray.DataArray's to modules that accept images (e.g., grdimage and psimage)?

My impression was that passing RGB (M, N, 3) grids into grdimage is not supported/recommended anymore (see https://github.com/GenericMappingTools/pygmt/pull/370#issuecomment-687171814). If that is the case, we probably don't need to add GMT_IMAGE to pygmt/clib/session.py? But best to double check with the GMT core team.

maxrjones commented 2 years ago

Those are some great comments! I've been putting off this feature as I wanted to see how the rest of the PyData ecosystem is storing multi-band images, notably with packages like rioxarray and xarray-spatial. Best to have PyGMT's grdmix work well with those libraries.

  1. For the output, I would prefer the default to be a (M,N,3) xarray.DataArray rather than a DataSet or list of DataArrays. Then for the fastest plots people could still use xarray's wrapping of matplotlib.pyplot.imshow().

👍. Yes I think producting an (M,N,3) xarray.DataArray output sounds like the way to go. We should also decide on whether to use rioxarray.open_rasterio (see https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#reading-files).

Good point, I think this will take some experimentation to figure out whether xarray or rioxarray will work better for opening the resultant data and metadata.

  1. Grdmix can do several things - I use the grdmix -C -N option to produce an image that can be used for grdimage the most often, but there's also blending of grids and adding of transparencies. My question here is whether it's best to wrap all of grdmix's capabilities as one function or break it up a bit? The feature that I would most like to see in PyGMT is a function that takes three xarray.DataArray's, applies some clipping of the tails, normalizes to 0-1, and creates an image that can be passed to pygmt.grdimage (kinda similar to GMT.jl's truecolor function). This currently takes quite a few steps in GMT (e.g., a forum post), but I think it could be relatively simple to support as one step in PyGMT and expect this is a common task for anyone dealing with multi-band data (e.g., the preprocessing for example data in Allow passing in Red, Green, Blue grids into grdimage #370).

In the short term (PyGMT v0.5.0), I would suggest doing a minimal thin wrapper around GMT grdmix.

Agreed!

There are some inherent limitations with grdmix I think, in that GMT allows only 0-255 (8-bit) colors which is suitable for plotting, but not for actual analysis work on satellite images like Sentinel 2 which has a 12-bit radiometric resolution. For those analysis tasks, which will be a more long term goal (PyGMT v0.6.0?), we should either:

1. defer users to using `rasterio`/`rioxarray`/`xarray-spatial` (which are pretty much all GDAL wrappers with a Pythonic interface) and have PyGMT `grdmix` be used solely for the final visualization plot; and/or

2. wrap more modules with satellite preprocessing capabilities (e.g. `grdhisteq` as in [Add a function wrapping GMT's grdhisteq module #1399](https://github.com/GenericMappingTools/pygmt/issues/1399)) and write a full tutorial on how PyGMT can be used to perform the entire raw data -> processing -> visualization pipeline.

The option depends on what radiometric resolution people want to work on. Those ok with 8-bit radiometric resolution can use Option 2 (though I don't see many optical satellite products that have such a low bit resolution anymore...) while those wanting higher (e.g. 16-bit) radiometric resolution should go with Option 1. Another thing we may want to consider is whether the GMT C library should support 16-bit (0-65536) precision for grdmix. Something worth raising upstream perhaps 🙂

As you said, the 8-bit resolution is a problem for analysis but not really for visualization since most monitors/printers are 8-10 bit depth anyways. I have only used grdmix for visualization, what would grdmix be used for that would benefit from 16-bit precision?

  1. The feature request in point 2 still requires passing an image into grdimage (related to Allow passing in Red, Green, Blue grids into grdimage #370, which passes three grids to represent an image). I expect this would require using the GMT_IMAGE resource, but this isn't currently included in the FAMILIES in pygmt.clib.sesssion (https://github.com/GenericMappingTools/pygmt/blob/master/pygmt/clib/session.py#L30-L36). Could this be used to pass (M,N,3) xarray.DataArray's to modules that accept images (e.g., grdimage and psimage)?

My impression was that passing RGB (M, N, 3) grids into grdimage is not supported/recommended anymore (see #370 (comment)). If that is the case, we probably don't need to add GMT_IMAGE to pygmt/clib/session.py? But best to double check with the GMT core team.

My understanding is that passing three (M, N) grids to grdimage is not recommended and that GMT_IMAGE is the recommended option. It seems there are two tasks here: 1) how to pass data representing rgb bands (imagery) to grdimage and 2) how to return an image from a module like grdmix. Perhaps we should open a separate issue for task 1 and then check whether GMT_IMAGE is the recommended option.