GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
758 stars 220 forks source link

GMTDataArrayAccessor doesn't work for temporary files that have been sliced #524

Open seisman opened 4 years ago

seisman commented 4 years ago

Description of the problem

It may be a GMTDataArrayAccessor bug, or maybe we should update grdcut() to work with GMTDataArrayAccessor. The problem is that the temporary netCDF file is already deleted when GMTDataArrayAccessor tries to access it.

Full code that generated the error

import pygmt

grid = pygmt.grdcut("@earth_relief_01d", region=[30, 50, 60, 70])
print(grid)
print(grid.gmt.registration)

Full error message

<xarray.DataArray 'z' (lat: 10, lon: 20)>
array([[  46. ,   -2.5,    0. ,   42. ,  150. ,  220. ,  172. ,  147. ,
         153.5,  159. ,  177. ,  172. ,  165. ,  137. ,  144.5,  150. ,
         119. ,  142. ,  131.5,  149.5],
       [  18.5,   19. ,   80.5,  139. ,  120.5,   33. ,   54. ,  202.5,
         153. ,  126. ,  175. ,  165.5,   69. ,  180. ,  147. ,  147. ,
          61. ,  106. ,  112.5,  147. ],
       [ 125.5,  173. ,  185. ,  136.5,   70.5,   51.5,  158.5,  140. ,
         142. ,   97. ,  139. ,  133. ,   58.5,   53.5,  109. ,  171. ,
         158. ,  153. ,  152. ,  156. ],
       [ 173.5,  223.5,  199.5,  130.5,   95. ,  130.5,  170.5,  114. ,
          51. ,   80. ,   78.5,   30.5,   82. ,  129. ,  137. ,  126. ,
         139. ,  118. ,  138.5,  148. ],
       [ 216.5,  163. ,  133. ,  121. ,   51.5,   -7. ,  -18. ,   65. ,
          96. ,   34.5,   19.5,   58.5,   78. ,   42. ,   69.5,   96.5,
          82. ,  155. ,  132.5,  192.5],
       [ 187. ,  163.5,  123.5,   74.5,   12. ,  -50. , -177.5, -126. ,
        -106.5,  -37. ,   94. ,  101.5,  121.5,   28. ,   38. ,   64.5,
          67. ,   89. ,  118.5,  154. ],
       [ 169. ,  137. ,   63.5,  -23. , -100. ,   33. ,   85. ,  151. ,
         179.5,  184.5,   60. ,  -34. ,    1. ,   -8.5,   10. ,   40.5,
          31. ,   33. ,   76.5,  136.5],
       [ 288. ,  252. ,  190. ,  273. ,  208.5,  231. ,  191.5,  207.5,
         233.5,  243. ,  180.5,  -38. ,  -27. ,  -10.5,   16.5,  -18. ,
         -28. ,  -31. ,   89. ,   59.5],
       [ 157.5,  138.5,  234. ,  218.5,  260.5,  214. ,  235. ,  171. ,
         -49.5,  -83. ,  -74. ,  -68. ,  -68. ,  -25. ,   46.5,   16.5,
         -43.5,  -51. ,  -31. ,  -26.5],
       [ 204.5,  145.5,   77.5, -152. , -122. , -178. , -187.5, -160.5,
        -125.5, -144.5, -154. , -138. ,  -90.5,  -65.5,  -81. ,  -76.5,
         -64.5,  -46.5,  -12. ,   -0.5]], dtype=float32)
Coordinates:
  * lon      (lon) float64 30.5 31.5 32.5 33.5 34.5 ... 45.5 46.5 47.5 48.5 49.5
  * lat      (lat) float64 60.5 61.5 62.5 63.5 64.5 65.5 66.5 67.5 68.5 69.5
Attributes:
    long_name:     elevation (m)
    actual_range:  [-187.5  288. ]
grdinfo [ERROR]: File /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc was not found
grdinfo [ERROR]: Cannot find file /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc
grdinfo [ERROR]: Must specify one or more input files
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print(grid.gmt.registration)
  File "/Users/seisman/.anaconda/lib/python3.7/site-packages/xarray/core/extensions.py", line 36, in __get__
    accessor_obj = self._accessor(obj)
  File "/Users/seisman/Gits/gmt/pygmt/pygmt/modules.py", line 247, in __init__
    int, grdinfo(self._source, C="n", o="10,11").split()
  File "/Users/seisman/Gits/gmt/pygmt/pygmt/modules.py", line 51, in grdinfo
    lib.call_module("grdinfo", arg_str)
  File "/Users/seisman/Gits/gmt/pygmt/pygmt/clib/session.py", line 502, in call_module
    module, status, self._error_message
pygmt.exceptions.GMTCLibError: Module 'grdinfo' failed with status code 71:
grdinfo [ERROR]: File /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc was not found
grdinfo [ERROR]: Cannot find file /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc
grdinfo [ERROR]: Must specify one or more input files

System information

Please paste the output of python -c "import pygmt; pygmt.show_versions()":

PyGMT information:
  version: v0.1.2+11.g168c77c
System information:
  python: 3.7.6 (default, Jan  8 2020, 13:42:34)  [Clang 4.0.1 (tags/RELEASE_401/final)]
  executable: /Users/seisman/.anaconda/bin/python
  machine: Darwin-19.5.0-x86_64-i386-64bit
Dependency information:
  numpy: 1.18.1
  pandas: 1.0.1
  xarray: 0.15.1
  netCDF4: 1.5.3
  packaging: 20.1
  ghostscript: 9.52
  gmt: 6.2.0_308386e_2020.07.11
GMT library information:
  binary dir: /Users/seisman/.anaconda/bin
  cores: 4
  grid layout: rows
  library path: /Users/seisman/local/GMT/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/seisman/local/GMT/lib/gmt/plugins
  share dir: /Users/seisman/local/GMT/share
  version: 6.2.0
weiji14 commented 4 years ago

It may be a GMTDataArrayAccessor bug, or maybe we should update grdcut() to work with GMTDataArrayAccessor. The problem is that the temporary netCDF file is already deleted when GMTDataArrayAccessor tries to access it.

Good spotting, this won't be specific to just grdcut, but the other grd* modulles as well. It will likely require some modification to these lines:

https://github.com/GenericMappingTools/pygmt/blob/f401f8527cfee4dc8f48d78ad93d23b678ecfbc8/pygmt/gridops.py#L114-L115

There's 2 solutions I can see:

  1. Trigger the GMTDataArrayAccessor inside the with block, so that the registration and gtype information is stored before the file disappears.
  2. Don't delete the tempfile, maybe keep it somewhere while the session is still running.

Which would you prefer? Or is there a third way to resolve this?

seisman commented 4 years ago

Don't delete the tempfile, maybe keep it somewhere while the session is still running.

We probably don't want to maintain a list of temporary files.

Trigger the GMTDataArrayAccessor inside the with block, so that the registration and gtype information is stored before the file disappears.

It sounds a good way, but how to trigger it?

BTW, at least we need to check if the netCDF source exists in GMTDataArrayAccessor.

weiji14 commented 4 years ago

It sounds a good way, but how to trigger it?

Simply by calling grid.gmt.registration inside the with block. That would then store the registration information in the class.

BTW, at least we need to check if the netCDF source exists in GMTDataArrayAccessor.

Yes, the error message could be improved, will see how best to do this in the PR.

seisman commented 4 years ago

Simply by calling grid.gmt.registration inside the with block. That would then store the registration information in the class.

Do you mean the following code?

 with xr.open_dataarray(outgrid) as dataarray: 
     result = dataarray.load() 
     result.gmt.registration
     result.gmt.gtype

It's a little weird.

weiji14 commented 4 years ago

I think just result.gmt will do, but yes it's weird (and I'm sure pylint will complain about some useless-statement error). We'll also need to apply this to every grd* function that loads a temporary netcdf file. Maybe there's a better solution?

seisman commented 4 years ago

I think just result.gmt will do, but yes it's weird (and I'm sure pylint will complain about some useless-statement error). We'll also need to apply this to every grd* function that loads a temporary netcdf file. Maybe there's a better solution?

Perhaps we can add result.gmt as a temporary workaround and see how it can be improved in the future.

Issues #496 and #489 can't be fixed without the workaround.

weiji14 commented 4 years ago

Ok, I'll try and do that tonight after dinner.

seisman commented 4 years ago

Keep the issue open, since we don't really like the workaround.

seisman commented 4 years ago

It still doesn't work for a slice of a grid. For example:

In [1]: import pygmt

In [2]: grid = pygmt.grdcut("@earth_relief_01d", region="0/90/0/90")

In [3]: grid.gmt.registration
Out[3]: 1

In [4]: grid2 = grid[10:41,30:101]

In [5]: grid2
Out[5]:
<xarray.DataArray 'z' (lat: 31, lon: 60)>
array([[  482. ,   435. ,   391.5, ..., -3402.5, -3340.5, -3281. ],
       [  731. ,   571. ,   393. , ..., -3303.5, -3228.5, -3176. ],
       [  543. ,   490. ,   395.5, ..., -3166.5, -3132. , -3066.5],
       ...,
       [ 1312.5,  1136.5,  1058. , ...,  1514.5,  3121. ,  3689.5],
       [ 1104.5,   993.5,  1084.5, ...,   875.5,   815.5,   894.5],
       [  623. ,  1156. ,  1329. , ...,   861.5,   836. ,   822. ]],
      dtype=float32)
Coordinates:
  * lon      (lon) float64 30.5 31.5 32.5 33.5 34.5 ... 85.5 86.5 87.5 88.5 89.5
  * lat      (lat) float64 10.5 11.5 12.5 13.5 14.5 ... 36.5 37.5 38.5 39.5 40.5
Attributes:
    long_name:     elevation (m)
    actual_range:  [-5122.   5651.5]

In [6]: grid2.gmt.registration
grdinfo [ERROR]: File /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-15alnpr7.nc was not found
grdinfo [ERROR]: Cannot find file /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-15alnpr7.nc
grdinfo [ERROR]: Must specify one or more input files
weiji14 commented 4 years ago

Can't say I didn't see this one coming. I'm tempted to go with Solution 2 "Don't delete the tempfile, maybe keep it somewhere while the session is still running." now, since there's no easy way for to transfer the registration/gtype info from the original xarray grid to the sliced xarray grid.

Another option is to check explicitly if the *.nc file exists as you mentions. But if it doesn't, and we fallback on the default gridline registration/Cartesian setting, that's not ideal either. We'll need to come up with a better solution somehow.

seisman commented 4 years ago

The ultimate goal is to avoid temporary files. It's only possible if PyGMT supports GMT data types

weiji14 commented 4 years ago

That's definitely a long term thing (for PyGMT > v0.3.0). Would need someone familiar with both Python and C I think, the new postdoc might the one to do it :laughing:

weiji14 commented 3 years ago

I'll need to do more investigation, but there's some discussion in xarray about moving towards keeping attrs (attributes) by default for slice operations. See https://github.com/pydata/xarray/issues/1614 and https://github.com/pydata/xarray/issues/3891. Right now, it might be possible for users to use xr.set_options(keep_attrs=True)(not tested, might not work, see http://xarray.pydata.org/en/v0.16.2/generated/xarray.set_options.html), and get the gmt accessor attributes preserved on slicing?

Anyways, I think approving PR #873 and solving #870 is fine for now. But we'll see if users come up with problems down the line.

weiji14 commented 3 years ago

Cross-referencing a related bug reported at https://forum.generic-mapping-tools.org/t/how-to-get-metadata-and-plot-grid-from-xarray-dataset/2010. In that case, an xarray.Dataset was sliced into an xarray.DataArray by selecting one coordinate, but passing the sliced xarray.DataArray throws up grdinfo [WARNING]: Detected a data cube because it is still referencing the original 3D netCDF data cube file, instead of the 2D xarray.DataArray grid slice.

maxrjones commented 2 years ago

I ran into the following today when trying to use some data via OPeNDAP. The try/except strategy in the accessor leads to a successful figure, but gmt's error messages are confusing. If we are expecting the call to fail every once in a while and have a fall back, we could add verbose="q" to the grdinfo call as a short term solution. https://github.com/GenericMappingTools/pygmt/blob/951d30aa9578b4108b768983c09208c021ec1d1f/pygmt/accessors.py#L30-L39

time = '2011-01-20T18:00:00'
param = 'vwnd'
level = 250

ds = xr.open_dataset(f'https://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/pressure/{param}.{time[:4]}.nc').sel({'level': level, 'time': time})

fig = pygmt.Figure()
fig.grdimage(ds['vwnd'], frame=True, projection="H15c")
fig.show()

Error messages:

grdinfo [ERROR]: Libcurl Error: HTTP response code said error
grdinfo [WARNING]: You can turn remote file download off by setting GMT_DATA_UPDATE_INTERVAL to "off"
grdinfo [ERROR]: Cannot find file vwnd.2011.nc
grdinfo [ERROR]: Cannot find file vwnd.2011.nc
grdinfo [ERROR]: File vwnd.2011.nc not found
[Session pygmt-session (51)]: Error returned from GMT API: GMT_FILE_NOT_FOUND (16)
seisman commented 2 years ago

For xarray.DataArray objects, currently PyGMT relies on the output of the grdinfo call to determine the registration and gtype. If the grdinfo fails, PyGMT fallbacks to the registration=0 and gtype=0.

See what GMT does at https://github.com/GenericMappingTools/gmt/blob/e63b2590b68969908efc393d008a9d024617eed2/src/gmt_nc.c#L656.

It seems GMT has its own logic to detect a netCDF file's grid registration. I think PyGMT should follow the same logic. It would be easier if GMT can provide an API function for this.