GenericMappingTools / gmt

The Generic Mapping Tools
https://www.generic-mapping-tools.org
Other
843 stars 352 forks source link

GMT_Write_Data for netCDF grids changes the grid #3800

Open PaulWessel opened 4 years ago

PaulWessel commented 4 years ago

Description of the problem

The netCDF grid writing library in GMT deals with chunking and optional or auto scaling/offsetting the grid to take up as little space as possible. This works well for most cases and all modules since the modules were written as stand-alone command-line tools and when one finished the process died. However, when one is using the API to write a grid and then continue to use the grid after writing, the values may have changed due to the fact that the data may have been scaled and offset to fit some specific storage unit like short int. Thus, a module calling another module passing a virtual file may find its grid no longer making sense.

Solutions:

The only way to completely avoid this problem is to undo the translation after the writing has completed. This will add more CPU cycles but will treat the grid as "read-only". Otherwise, we can never guarantee that a grid can continue to be used after writing. This is especially bad if the input grid is a reference from the outside. So I think we simply have to do this.

The only relief would be if one masses in a bit-mode flag like GMT_GRID_FINISHED or something, so that someone who is writing a specific set of steps and know that the process is done after teh writing done spend time undoing a scaling, since it won't be used again. We can never be sure of that in the API since anyone can write a tool that calls surface with -Gresult.nc=ns+s0.1 and then call grdimage, and be surprised at the result.

So @joa-quim and @seisman, I will undo any translation done in gmt_nc.c unless a flag (I may think of another name) is passed, and GMT will not pass that flag unless it is a grid that cannot be returned anywhere.

PaulWessel commented 4 years ago

A related issue is the fact that _gmt_nc_writegrd changes the default chunk setting based on the grid at hand, but never reset it. Thus, when I tried to do two calls to _GMT_WriteData I got an error since the chunk size for the first grid was inappropriate for the second. This is being addressed in another PR, but it just points out that there were two assumptions built into this code:

  1. The grid would not be used after the writing
  2. There would only be one call to write a grid during the module lifetime

Neither of those make sense in the API in general.

PaulWessel commented 3 years ago

Any comments on this @GenericMappingTools/core ? Seems we should correct this in one way or another, perhaps when external use is detected via API->external and GMT_IS_REFERENCE is passed?

maxrjones commented 3 years ago

Seems necessary to me. Did the solution to the related issue get solved already?

PaulWessel commented 3 years ago

I think so, but I would be in a better position to answer if the idiot who posted the comment above had cared to add the link to that PR he referred to...

PaulWessel commented 3 years ago

I agree it seems necessary - provided the grid is passed in is read-only. But let's think about this a bit more:

So I wonder if this is specifically limited to situations where

  1. The grid was passed in from PyGMT
  2. The grid was flagged as a read-only matrix
  3. The grid was used to write a netCDF output grid file

Please comment on this to see the extent of fixes needed.

joa-quim commented 3 years ago

Yes, GMT.jl is like GMTMEX. Column-wise so a copy is needed. Sometime in future I may try to send in a Julia allocated memory ... but then I would have to deal with the PAD in Julia (chills ...).