Open shoyer opened 9 years ago
Seems like when nc_put_att_text
is used, the result is stored as a scalar in the hdf5 file. If nc_put_att_string
is used (when the string is unicode) a simple dataspace is created. Here's the relevant code snippet in _netCDF4.pyx
:
if value_arr.dtype.char == 'U' and not is_netcdf3:
# a unicode string, use put_att_string (if NETCDF4 file).
ierr = nc_put_att_string(grp._grpid, varid, attname, 1, &datstring)
else:
ierr = nc_put_att_text(grp._grpid, varid, attname, lenarr, datstring)
I think you are right that this is due to how nc_put_att_string
is implemented in the C library. It seems to be designed to write arrays of variable length strings.
Should I open a bug report for the C library, then?
Sure, wouldn't hurt. At the very least maybe we will found out why they chose to do it that way.
This code writes a single string attribute to an HDF5 file using netCDF4:
Here's code do to the same thing with h5py:
As you can see from the results of
h5dump
, netCDF4-python is writing the attribute as a "simple dataspace" which corresponds to a multi-dimensional array of 1-element: https://www.hdfgroup.org/HDF5/doc/UG/UG_frame12Dataspaces.htmlIn fact, this is exactly what you get if you view the file created with netCDF4-python using h5py (to netCDF4-python and ncdump, they appear identical):
I believe netCDF4-python should be writing the attribute as a scalar, similarly to want it does if you write bytes (or a string on Python 2):
Given that netCDF4-python is simply using the netCDF-C library's
nc_put_att_string
function, this may very well be a bug upstream in the netCDF-C library.