NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
126 stars 19 forks source link

round-trip failure cf.read to cf.write #749

Closed JonathanGregory closed 8 months ago

JonathanGregory commented 8 months ago

I get an error from cf.write for a field I have got by cf.read from durack_prime.nc. This file can be found in /storage/basic/baobab/jonathan/general/FAFMIP on the Reading system, or you can make it from durack_prime.cdl with ncgen. I'm aware that the field and coordinate data are all missing. In my program, I read in the field as a template, fill in the data, and write it out, but it fails in the same way, indicating the problem isn't the missing data. Here's the error. Is it my mistake? If not, is there a work-round or can it be fixed? Thanks.


>>> cf.environment()
Platform: Linux-3.10.0-1160.108.1.el7.x86_64-x86_64-with-glibc2.17 
HDF5 library: 1.12.2 
netcdf library: 4.9.3-development 
udunits2 library: libudunits2.so.0 
esmpy/ESMF: not available 
Python: 3.9.13 /share/apps/python/anaconda/2022.10/bin/python
dask: 2023.7.0 /home/users/sws02jmg/.local/lib/python3.9/site-packages/dask/__init__.py
netCDF4: 1.6.4 /home/users/sws02jmg/.local/lib/python3.9/site-packages/netCDF4/__init__.py
psutil: 5.9.0 /share/apps/python/anaconda/2022.10/lib/python3.9/site-packages/psutil/__init__.py
packaging: 21.3 /share/apps/python/anaconda/2022.10/lib/python3.9/site-packages/packaging/__init__.py
numpy: 1.25.1 /home/users/sws02jmg/.local/lib/python3.9/site-packages/numpy/__init__.py
scipy: 1.10.0 /home/users/sws02jmg/.local/lib/python3.9/site-packages/scipy/__init__.py
matplotlib: 3.5.2 /share/apps/python/anaconda/2022.10/lib/python3.9/site-packages/matplotlib/__init__.py
cftime: 1.6.2 /home/users/sws02jmg/.local/lib/python3.9/site-packages/cftime/__init__.py
cfunits: 3.3.6 /home/users/sws02jmg/.local/lib/python3.9/site-packages/cfunits/__init__.py
cfplot: not available 
cfdm: 1.11.1.0 /home/users/sws02jmg/.local/lib/python3.9/site-packages/cfdm/__init__.py
cf: 3.16.1 /home/users/sws02jmg/.local/lib/python3.9/site-packages/cf/__init__.py
>>> p=cf.read('durack_prime.nc')[0]
>>> cf.write(p,'input4MIPs.CMIP6.faf-heat-NA50pct.nc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/sws02jmg/.local/lib/python3.9/site-packages/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/users/sws02jmg/.local/lib/python3.9/site-packages/cf/read_write/write.py", line 808, in write
    netcdf.write(
  File "/home/users/sws02jmg/.local/lib/python3.9/site-packages/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/users/sws02jmg/.local/lib/python3.9/site-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 4911, in write
    self._file_io_iteration(
  File "/home/users/sws02jmg/.local/lib/python3.9/site-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 5153, in _file_io_iteration
    self._write_global_attributes(fields)
  File "/home/users/sws02jmg/.local/lib/python3.9/site-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 4390, in _write_global_attributes
    g["netcdf"].setncattr(
  File "src/netCDF4/_netCDF4.pyx", line 3042, in netCDF4._netCDF4.Dataset.setncattr
  File "src/netCDF4/_netCDF4.pyx", line 1757, in netCDF4._netCDF4._set_att
  File "src/netCDF4/_netCDF4.pyx", line 2029, in netCDF4._netCDF4._ensure_nc_success
AttributeError: NetCDF: String match to name in use
davidhassell commented 8 months ago

Thanks, Jonathan. This issue (https://github.com/Unidata/netcdf4-python/issues/1020) suggests that it is due to the user setting of an "illegal" attribute. I can replicate your result when I access your durack_prime.nc, but when I read the CDl directly it all works:

>>> import cf
>>> f = cf.read('durack_prime.cdl')[0]   # cf-python reads CDL!
>>> cf.write(f, 'tmp.nc')
>>> g = cf.read('tmp.nc')[0]
>>> g.equals(f)
True

How did you make your netCDF file? cf-python is using ncgen -knc4 -o, but I also tested with ncgen -o, and that woked too ...

JonathanGregory commented 8 months ago

Thanks for testing it. I made durack_prime.cdl on the RACC with ncgen -o. I don't get the error if I make it on RACC with ncgen -knc4 -o (file durack_prime.racc-knc4.nc), nor if I make it on my desktop machine with ncgen -o (file durack_prime.mreydhon5.nc). Apparently the problem is coming from the default behaviour of /usr/bin/ncgen on RACC. It's an old version of ncgen (netcdf library version 4.3.3.1 of Dec 10 2015 16:44:18), whereas on my desktop machine I have a modern one (netcdf library version 4.8.1 of Jan 25 2023 06:27:17). durack_prime.nc is a netCDF-3 file, I presume.

racc-login-0$ file durack_prime.nc
durack_prime.nc: NetCDF Data Format data
racc-login-0$ file durack_prime.racc-knc4.nc
durack_prime.racc-knc4.nc: Hierarchical Data Format (version 5) data

Is that a correct interpretation? If so, I suppose it means there is something which isn't being set correctly when a netCDF-3 file is read. In any case, I know how to work round it now, thanks.

davidhassell commented 8 months ago

Thanks, Jonathan.

Your interpretation makes sense to me. I'm guessing that the presence of the _NCProperties global attribute is handled differently with netCDF-3 output. Nearly 10 years old - that impressive(ly bad).