Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
748 stars 261 forks source link

Error in renameDimension #1357

Open veenstrajelmer opened 1 month ago

veenstrajelmer commented 1 month ago

Based on https://github.com/Unidata/netcdf4-python/issues/817, but with the rename ordering switched just to be safe. There seems to be a bug in renameDimension:

import netCDF4 as nc

with nc.Dataset('test.nc', 'w') as fp:
    fp.createDimension('x', 3)
    ncvar = fp.createVariable('x', float, ('x',))
    ncvar[:] = [1.1, 2.2, 3.3]

with nc.Dataset('test.nc', 'r+') as fp:
    print(fp.variables['x'][:])
    fp.renameVariable('x', 'lon')
    fp.renameDimension('x', 'lon')
    print(fp.variables['lon'][:])

with nc.Dataset('test.nc', 'r') as nc:
    print(nc.variables['lon'][:])

Prints:

[1.1 2.2 3.3]
[1.1 2.2 3.3]
[-- -- --]

It seems that the data is corrupted upon saving the file. I would expect it would be just possible to rename a dimension without losing the data. My usecase can be found here: https://forum.ecmwf.int/t/new-time-format-in-era5-netcdf-files/3796/5?u=jelmer_veenstra

This only happens when the variable name is equal to the dimension name, and both have to be renamed. If we comment one of the rename actions, the data is preserved.

jswhit commented 1 month ago

possibly related to https://github.com/Unidata/netcdf-c/issues/597

veenstrajelmer commented 1 month ago

You are linking to an unresolved issue from 2017 that describes a fundamental problem. That is a bit unexpected to me for a well-known and widely-used package like this. It seems that if both dimensions and variables/coords are renamed at once, the issue does not appear. Is that possible via the python API or is there another workaround?

jswhit commented 1 month ago

maybe save the data from the variable before renaming, then copy the data back to the renamed variable?

jswhit commented 1 month ago

to answer your question, there's nothing in the C API that allows you to rename a dimension and a variable both at the same time.

veenstrajelmer commented 1 month ago

Ok, for me it does not have to be at the same time, as long as the dataset is not messed up.. Your suggestion would be a bit cumbersome. An alternative would be to do the renaming with xarray, but this requires me to save it into as separate dataset as xarray does not change netcdf files inplace. Either way, I think it would still be valuable if this bug is fixed. Would this be possible or not to be expected?

jswhit commented 1 month ago

the other workaround mentioned in the issue is to convert the file to netcdf3, then do the reanaming, and convert back. Very cumbersome for sure. Unfortunately, this is not something we can fix here since it's not happening in the python API, but in the underlying C library. I would suggest you contribute your example to the netcdf-c issue and ask for a progress update from the developers.

veenstrajelmer commented 1 month ago

Thanks, I have posted a reply in the issue you linked before. I was actually not paying attention before and did not notice the linked issue was about netcdf-c. Thanks for looking it up and understandable that if it does not work in netcdf-c, it can also not work in netcdf4-python.