GenericMappingTools / gmt

The Generic Mapping Tools
https://www.generic-mapping-tools.org
Other
864 stars 359 forks source link

Transparent handling of 3-D grids in GMT #2912

Closed PaulWessel closed 3 years ago

PaulWessel commented 4 years ago

Description of the desired feature

While GMT deals with situations where a 3-D grid is created, read, or written, it does so very awkwardly (via arrays of 2-D grids and external layer arrays) as we have no 3-D grid representation. This issue discusses the most transparent and non-intrusive way for GMT as a whole to support the notion of a 3-D grid without lots of rewriting. My key assumption is that all layers share the same number of nodes and horizontal coordinates. Here is my list of modifications that would not change the API (but instead add three new API functions).

  1. The GMT_GRID_HEADER structure has two parts: One dictated by legacy for what is written to binary native grids, and one that is accessible internally. The first cannot change, but in the hidden area we can add new parameters as needed. Here, we will add uint32_t n_layers, double *layer, unsigned int l_type, l_units[GMT_GRID_UNIT_LEN80]. These parameters can hold the number of layers, the layer values (we must allow for non-equidistant layers), the layer value type (float, time, etc.), and the name/unit of that dimension. These could be added now with no effect on GMT whatsoever.

Grids are either created via _GMT_CreateData, read via _GMT_ReadData, or written via _GMT_WriteData. Here are changes required under the hood without changing these API prototypes.

  1. _GMT_CreateData: Nothing special. The range, inc, dim parameters are used to pass the dimension ranges, increments or lengths, so if family is GMT_IS_GRID and the earlier parameters imply a cube then we set the header accordingly and allocate space (if requested). If layer spacing is equidistant (implied by a positive inc[GMT_Z], then the layer array can be computed. We do something very similar for a 3-D GMT_MATRIX. However, if inc[GMT_Z] == 0, range[ZHI] > range[ZLO], and dim[GMT_Z] > 0 then we have a non-equidistant layering scheme. For this we need a new API function int GMT_Put_Layers (void *API, struct GMT_GRID *G, unsigned int n_layers, double *layers) to place this information in the grid header.
  2. _GMT_ReadData: If the gmt_nc.c functions detect this is a data cube and no single layer has been specified (currently, the only way to read these grids is to specify a layer), then it calls _GMT_CreateData as above and reads in the header and/or data. The Grid->data array is simply of size _nlayers times the size of one slice (mx * my). Usually, n_layers = 1.
  3. _GMT_WriteData: If we pass a 3-D grid and the output file name contains a C format specifier then we write a stack of 2-D grids using the layer array to create individual names, else we write a 3-D cube.

We also need a few more functions and macros:

  1. We need a few new (internal, i.e., in gmt_dev.h) macros that relate (row, col, layer) to a 1-D grid index. Based on what we already have these might be called _gmt_Mijkp (honor padding) and _gmt_Mijk0 (ignore padding). 6 For external users we need to add the third API function uint64_t GMT_Get_Index3 (void *API, struct GMT_GRID_HEADER *header, int row, int col, int layer); for the same reason.

When, as in almost all cases in GMT, we are dealing with a single layer (2-D) grid then there are no changes to the code other than refusing to read a 3-D grid when the module cannot handle it (i.e., when it expects a single layer): Accessing Grid->data requires no special handling as data is a single layer (as currently is always true). For modules that must deal with a data cube we can easily navigate the 3-D grid via loops over layers and _gmt_Mijkp, and the reading and writing will be transparent as well with existing calls. It is possible we may find further functions to write that extracts data along lines or slices through the cube.

In summary:

I hope @GenericMappingTools/core can comment on this, point out omissions or errors, and guide this to a final proposal.

seisman commented 4 years ago

My key assumption is that all layers share the same number of nodes and horizontal coordinates.

I don't know much about the details, but the assumption covers most use cases, and looks reasonable to me.

PaulWessel commented 4 years ago

I should add that the lack of the above means modules get quite complicated as I introduce arrays for 2-D grids to handle the 3-D data internally. I.e., loops over reading, loops over writing, etc. Hi @joa-quim, any concerns you see? @remkos?

joa-quim commented 4 years ago

I do quite a bit of 3D netCDF in Mirone (using matlab+MEX functions). For 3D cube one can write a file in unlimited mode (means we can add more layers to file), or fixed size. For the time axis I use either the layer counting or a float vector with the z-values for each layer. I thinks that's basically what you are proposing for the 3rd variable.

joa-quim commented 4 years ago

I don't think we need this uint32_t n_layers because we already have unsigned int n_bands that counts the same thing (though it was thought for bands in a multi-layered image).

PaulWessel commented 4 years ago

True that we could repurpose _nbands but for the cost of 2 bytes I think it is worth having a separate and better-named variable for this.

PaulWessel commented 4 years ago

I think I will make a branch implementing the basic changes and see if anything breaks.

PaulWessel commented 3 years ago

Closed as implemented by #4314