Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
744 stars 260 forks source link

support for HDF5 dimension scales with null dataspace #1226

Open itcarroll opened 1 year ago

itcarroll commented 1 year ago

I would like to use netCDF4-python (as backend to Xarray) to read some HDF5 files, and am unable to do so. Attempting to read the files actually crashes Python. I've traced the problem to a dimension scale with null dataspace in the HDF5 files. I understand that not all HDF5 files are netCDF4 files, but I don't think they should crash Python.

And in this particular case, the HDF5 file seems perfectly interpretable. As an enhancement to netCDF4-python, you could interpret a dimension scale with null dataspace for what it is equivalent to in netCDF4, which is "a netCDF dimension but not a netCDF variable."

Here is a reproducible example of code that crashes Python. I'm not totally sure the problem isn't just a mismatch between the HDF5 libraries used, since both netCDF4-python and h5py package their own libraries. My installs built nothing from source.

% cat danger.py
from h5py import File
from netCDF4 import Dataset

with File('danger.h5', 'w') as group:
    dataset = group.create_dataset('y', shape=(3,), dtype=float)
    dimension = group.create_dataset('x', shape=None, dtype=int)   # will crash python when read below
    # dimension = group.create_dataset('x', shape=(3,), dtype=int) # creates misleading dataset
    dimension.make_scale('x')
    dataset.dims[0].attach_scale(dimension)

with Dataset('danger.h5') as group:
    print(group)
% python danger.py
Assertion failed: (ndims), function get_scale_info, file hdf5open.c, line 1396.
zsh: abort      python danger.py

Here is the complete h5dump of danger.h5 created by h5py. While it is not a netCDF4 file, I can't think of any reason netCDF4-python shouldn't interpret it correctly (as it does in the above code but using the commented line). It is a dimension that has no coordinates, which is valid in the netCDF4 model.

HDF5 "danger.h5" {
GROUP "/" {
   DATASET "x" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  NULL
      DATA {
      }
      ATTRIBUTE "CLASS" {
         DATATYPE  H5T_STRING {
            STRSIZE 16;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "DIMENSION_SCALE"
         }
      }
      ATTRIBUTE "NAME" {
         DATATYPE  H5T_STRING {
            STRSIZE 2;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "x"
         }
      }
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_U32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): {
               DATASET 0 "/y",
               0
            }
         }
      }
   }
   DATASET "y" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
      DATA {
      (0): 0, 0, 0
      }
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): (DATASET 0 "/x")
         }
      }
   }
}
}

Thank you for considering! Here are my versions ...

% pip list
Package    Version
---------- -------
cftime     1.6.2
h5py       3.7.0
netCDF4    1.6.2
numpy      1.23.5
pip        22.1.2
setuptools 62.3.3
wheel      0.37.1

[notice] A new release of pip available: 22.1.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
% python --version
Python 3.10.8
% sw_vers
ProductName:    macOS
ProductVersion: 12.6.1
BuildVersion:   21G217
jswhit commented 1 year ago

If there is a workaround for this, it has to happen in the netcdf-c library. Can you file this as an issue at https://github.com/Unidata/netcdf-c?

itcarroll commented 1 year ago

Thanks, @jswhit. Filed as above. Or do I need to repeat/update the description? I hesitate to without knowing C.

itcarroll commented 1 year ago

@jswhit Any idea why there has been no comment from the Unidata team on Unidata/netcdf-c#2571?