Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
754 stars 262 forks source link

Editing the content of a dataset variable, changes the length of the unlimited dimension. #1166

Closed barbarapirscher closed 2 years ago

barbarapirscher commented 2 years ago

Version: netCDF-4 python, versions 1.5.7 and 1.5.8 The code works correctly for versions 1.5.4 to 1.5.6

Environment: Python3.9 numpy version 1.21.6

Description: Editing the content of a dataset variable, changes the length of the unlimited dimension. I attached the (tarred) netCDF-file, where I observed the problem.

Code: import numpy as np from netCDF4 import Dataset

filename ='wrfbdy_d01__sel' var_name = 'DUST_1_BXS' data = Dataset(filename, 'r+') print(data.dimensions['Time']) # --> <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'Time', size = 2 for all netCDF-4 python versions

modif_var = data.variables[var_name] increment = np.ones(modif_var[0, ...].shape) * 0.2 data.variables[var_name][0, ...] = modif_var[0, ...] + increment

print(data.dimensions['Time'])
# --> <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'Time', size = 1 for netCDF4 versions 1.5.7 and 1.5.8 # --> <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'Time', size = 2 for netCDF4 versions 1.5.4 to 1.5.6

wrfbdy_d01__sel.tar.gz )

jswhit commented 2 years ago

Thanks for the report, I can confirm with your script as well as this simple test:

import netCDF4
import numpy as np
nc = netCDF4.Dataset('test_issue1166.nc','w')
t = nc.createDimension('t',None)
x = nc.createDimension('x',100)
v = nc.createVariable('v',np.float,('t','x'))
v[0,:] = np.ones(100)
v[1,:] = 2*np.ones(100)
nc.close()
nc = netCDF4.Dataset('test_issue1166.nc','r+')
t = nc.dimensions['t']
print(len(t))
v[0,:] = np.zeros(100)
print(len(t))
nc.close()
nc = netCDF4.Dataset('test_issue1166.nc')
t = nc.dimensions['t']
print(len(t))
nc.close()

2
1
2

Looks as if the data in the file is correct, since if you close and re-open the file after modifying the variable data the dimension length is correct. However, the dimension length is reported incorrectly just after the variable is modified.

jswhit commented 2 years ago

This appears to be an issue with the underlying C lib. Here's a simple C test program to illustrate the bug:

#include <netcdf.h>
#include <stdio.h>
int main() {
    int i, iret, dimidx, dimidt, varid, ncid;
    int dimids[2];
    size_t start[2], count[2], dimlen;
    int data[10];
    iret = nc_create("test_issue1166.nc", NC_NETCDF4, &ncid);
    iret = nc_def_dim(ncid, "x", 10, &dimidx);
    iret = nc_def_dim(ncid, "t", NC_UNLIMITED, &dimidt);
    dimids[0] = dimidt;
    dimids[1] = dimidx;
    iret = nc_def_var(ncid, "v", NC_INT, 2, dimids, &varid);
    start[0]=0;
    start[1]=0;
    count[0]=1;
    count[1]=10;
    for (i = 0; i < 10; i++)
        data[i] = 1;
    iret = nc_put_vara_int(ncid, varid, start, count, data);
    start[0]=1;
    start[1]=0;
    count[0]=1;
    count[1]=10;
    for (i = 0; i < 10; i++)
        data[i] = 2;
    iret = nc_put_vara_int(ncid, varid, start, count, data);
    iret = nc_close(ncid);
    iret = nc_open("test_issue1166.nc", NC_WRITE | NC_NOCLOBBER, &ncid);
    iret = nc_inq_varid(ncid, "v", &varid);
    iret = nc_inq_dimid(ncid, "t", &dimidt);
    start[0]=0;
    start[1]=0;
    count[0]=1;
    count[1]=10;
    for (i = 0; i < 10; i++)
        data[i] = 0;
    iret = nc_put_vara_int(ncid, varid, start, count, data);
    iret = nc_inq_dimlen(ncid, dimidt, &dimlen);
    printf("dim length after write=%lu\n", dimlen);
    iret = nc_close(ncid);
}

With the latest version of netcdf-c (4.8.1) running the yields:

dim length after write=1

while ncdump on the file shows the dimension has length 2. Running the test program with netcdf-c 4.7.4 produces the correct answer (2).

I suspect the different answers you are getting with different versions of the python interface is because different versions of the C library are linked.

jswhit commented 2 years ago

This should now be fixed in netcdf-c 4.9.0 (which the netcdf4-python 1.6.0 wheels use)