NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
120 stars 19 forks source link

NumPy MaskError with read-in of header-info CDL #197

Open sadielbartholomew opened 3 years ago

sadielbartholomew commented 3 years ago

With the current cf and cfdm master branches installed, as well as on the current release set, the reading of certain CDL files fails due to a MaskedError which is raised by numpy to indicate an issue with conversion of masked elements. Examples are given below.

Update: we think the CDL for which this error emerges is that which has header information only, i.e. from ncdump -h and rather than a plain ncdump, though perhaps also ncdump -c in certain cases. Notably using ncdump without the -h option for header info. only in the cases below results in them being read-in from CDL fine, backing up that theory.

Details

After directing an ncdump -h on the contiguous.nc example tutorial dataset to a new file, I see:

>>> import cf
>>> f = cf.read('contiguous.nc.cdl', fmt='CDL')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cfdm/cfdm/decorators.py", line 181, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/sadie/cf-python/cf/read_write/read.py", line 667, in read
    fields = _read_a_file(
  File "/home/sadie/cfdm/cfdm/decorators.py", line 181, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/sadie/cf-python/cf/read_write/read.py", line 941, in _read_a_file
    fields = netcdf.read(
  File "/home/sadie/cfdm/cfdm/decorators.py", line 181, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/sadie/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 1270, in read
    self._parse_ragged_contiguous_compression(
  File "/home/sadie/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 1856, in _parse_ragged_contiguous_compression
    element_dimension = self._set_ragged_contiguous_parameters(
  File "/home/sadie/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 2351, in _set_ragged_contiguous_parameters
    element_dimension_size = int(
  File "/home/sadie/cf-python/cf/data/data.py", line 1467, in __int__
    return int(self.datum())
  File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/numpy/ma/core.py", line 4383, in __int__
    raise MaskError('Cannot convert masked element to a Python int.')

As a test I also converted all of the other sample netCDF datasets from our tutorial to CDL:

$ for filename in *.nc; do ncdump -h $filename > ${filename}.cdl ; done

and tried to read them in as above to see if the problem occurred in other cases. The error also occurs for geometry.nc.cdl (but not any other dataset) and emerges at the same line, though does not have the same stack trace:

>>> f = cf.read('geometry.nc.cdl', fmt='CDL')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cfdm/cfdm/decorators.py", line 181, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/sadie/cf-python/cf/read_write/read.py", line 667, in read
    fields = _read_a_file(
  File "/home/sadie/cfdm/cfdm/decorators.py", line 181, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/sadie/cf-python/cf/read_write/read.py", line 941, in _read_a_file
    fields = netcdf.read(
  File "/home/sadie/cfdm/cfdm/decorators.py", line 181, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/sadie/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 1333, in read
    geometry_ncvar = self._parse_geometry(
  File "/home/sadie/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 2220, in _parse_geometry
    n_nodes_in_this_cell = int(nodes_per_geometry_data[cell_no])
  File "/home/sadie/cf-python/cf/data/data.py", line 1467, in __int__
    return int(self.datum())
  File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/numpy/ma/core.py", line 4383, in __int__
    raise MaskError('Cannot convert masked element to a Python int.')
numpy.ma.core.MaskError: Cannot convert masked element to a Python int.

Clearly something about the nature of the original netCDF datasets in those two cases meant the error emerged where it otherwise hasn't, likely that they both have some masked data.

Environment

Note I tested this with the following, as well as for cf 3.8.0 and cfdm 1.8.8.0 with the same dependencies:

>>> cf.environment()
Platform: Linux-4.15.0-54-generic-x86_64-with-glibc2.10 
HDF5 library: 1.10.6 
netcdf library: 4.7.4 
udunits2 library: /home/sadie/anaconda3/envs/cf-env/lib/libudunits2.so.0 
Python: 3.8.5 /home/sadie/anaconda3/envs/cf-env/bin/python
netCDF4: 1.5.4 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/netCDF4/__init__.py
cftime: 1.4.1 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/cftime/__init__.py
numpy: 1.19.4 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/numpy/__init__.py
psutil: 5.7.3 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/psutil/__init__.py
scipy: 1.5.3 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/scipy/__init__.py
matplotlib: 3.3.3 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/matplotlib/__init__.py
ESMF: 8.0.1 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/ESMF/__init__.py
cfdm: 1.8.9.0 /home/sadie/cfdm/cfdm/__init__.py
cfunits: 3.3.1 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/cfunits/__init__.py
cfplot: 3.0.38 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/cfplot/__init__.py
cf: 3.9.0 /home/sadie/cf-python/cf/__init__.py
sadielbartholomew commented 3 years ago

This is probably easy to fix, but I won't be able to look into it until tomorrow.

sadielbartholomew commented 3 years ago

Cross-referencing cfdm as ultimately this emerges when a masked datum i.e. -- is not handled properly by netcdfread in cfdm and gets passed to __int__ .

sadielbartholomew commented 3 years ago

Adjusting the name of the Issue to be more precise after further investigation has pinned the problem down further...

sadielbartholomew commented 3 years ago

Note a related issue is to provide more informative error messages in cases of attempted operations on metadata-only CDL where data is essential to the operation, an example being given in https://github.com/NCAS-CMS/cf-python/issues/196#issuecomment-811867896.