barronh / pseudonetcdf

PseudoNetCDF like NetCDF except for many scientific format backends
GNU Lesser General Public License v3.0
76 stars 35 forks source link

Reopening of pseudonetcdf output raises error #111

Closed nishadhka closed 4 years ago

nishadhka commented 4 years ago

There is a WRF output necdf file preprocesed to use with camx model, https://drive.google.com/file/d/1KmFxnKi0FT5h9NRsBJo3mQS61EOcRP9x/view?usp=sharing. When I want to convert that file into CF form. used following codes

import PseudoNetCDF as pnc
inpath = 'camx_test4.nc'
outpath = 'cf_camx_test4.nc'

infile = pnc.pncopen(inpath, format='ioapi').copy()
pnc.conventions.ioapi.add_cf_from_ioapi(infile)
infile.save(outpath, verbose=0)

## the output file gets opened in xarray without any issue
db=xr.open_dataset(outpath)

However, after that python session closed (ctrl+x). Reopening of the outpath through following error

 python
Python 3.7.6 | packaged by conda-forge | (default, Mar  5 2020, 15:27:18) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray as xr
>>> outpath = 'cf_camx_test4.nc'
>>> db=xr.open_dataset(outpath)
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140408649123648:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1615 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #004: H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 198, in _acquire_with_cache_info
    file = self._cache[self._key]
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/lru_cache.py", line 53, in __getitem__
    value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('cf_camx_test4.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py", line 502, in open_dataset
    filename_or_obj, group=group, lock=lock, **backend_kwargs
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 358, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 314, in __init__
    self.format = self.ds.data_model
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 367, in ds
    return self._acquire()
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 361, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 186, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 204, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 2291, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1855, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: b'cf_camx_test4.nc'

Apart from the xarray open, this file is not being able to open in cdo as well, cdo raises a similar error for

cdo -info cf_camx_test4.nc
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140607716386624:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1615 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #004: H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed

cdo info: Open failed on >cf_camx_test4.nc<
Unknown Error

Interestingly, after the system reboot. In case within a docker env, logging off/exit. The opening of the file in xarry or cdo, won't raise any error. Is it a memory leak

barronh commented 4 years ago

I'm not totally sure, but I have a recommendation.

Replication without error

First, I want to describe how I tested you example on on Google Colab without an issue.

Steps:

  1. Follow link: https://colab.research.google.com/github/barronh/pseudonetcdf_examples/blob/main/prepcolab/prepcolab.ipynb
  2. Delete the last two cells (code and markdown)
  3. Add a cell with the code "pip install xarray"
  4. Add a cell with your code and the missing "import xarray as xr"
  5. Add another cell with
    
    import xarray as xr

db=xr.open_dataset('cf_camx_test4.nc')

6. Download camx_test4.nc from your link.
7. Upload it to colab.
8. Run all cells except for the last one. Runs without an error.
9. Restart the "kernel."
10. Run the last cell.

I get no errors or warnings... Not sure why this would be the case.

# Recommendation

My guess is that somewhere a buffering is occurring that leaves the file in an odd state. The `save` method returns the netcdf4.Dataset that is "on disk" and still in an open state. Perhaps an explicit `close` would help to flush write.

< infile.save(outpath, verbose=0)

infile.save(outpath, verbose=0).close()

Since I cannot repeat the errors, I cannot be sure. Please let me know if that helps.

nishadhka commented 4 years ago

Thanks, the explicit close did the magic. Now the files gets open without any error after a Python session. The code is as follows

import PseudoNetCDF as pnc
inpath = 'camx_test4.nc'
outpath = 'cf_camx_test4.nc'

infile = pnc.pncopen(inpath, format='ioapi').copy()
pnc.conventions.ioapi.add_cf_from_ioapi(infile)
infile.save(outpath, verbose=0).close()

Closing the python session (ctrl+x) and running following comment raises, no error!

import xarray as xr
outpath = 'cf_camx_test4.nc'
db=xr.open_dataset(outpath)