Writing an int variable with start after a double variable with start jumbles the data written

abhibaruah commented 1 year ago

NetCDF version: v4.9.1 HDF5 version: v1.10.8 OS: Linux

Hello, I found a possible bug with writing an NC_INT variable after writing an NC_DOUBLE variable (with start values other than the default). The NC_INT variable written is in an odd ordering, different from how I intended to write it. If I do not write the double variable before writing the int variable, the int data read back is different. Here is what I am doing.

Create an nc file.
Create two dimensions: dim1 is called 'y_dim' and length is 7, dim2 is called 'x_dim' and length is NC_UNLIMITED.
Create double data of size 4x4. Write it to a double variable with dimensions as above but with start as [0,2]. The data I write here is as follows (7x6): x x x x x x x x x x x x x x d d d d x x x d d d d x x x d d d d x x x d d d d x x x

(where ‘x’ represents the fill values and ‘d’ represents the data I want to write)

Create int32 data of size 4x4. Write it to a double variable with dimensions as above but with start as [1,1]. The int32 data I write here is as follows (7x5): x x x x x x x x d d d d x x x d d d d x x
x d d d d x x x d d d d x x
After this, I close the file and reopen it to read the int32 variable I created in Step 4 above. However, the variable I read is of dimensions (7x6) instead of (7x5) and the data is also weirdly organized. x d d d d x x x d d d x x x x d d x x x x x d x d x x x x x d d x x x x d d d x x x
This behavior is seen only when I write the double variable (in step 3) before the int32 variable. If I do not create the double variable, the int variable I read is the same as the one I write.

We initially found this bug in MATLAB, and I could reproduce it using a C standalone which links to netCDF v4.9.1. I understand that the reproduction steps are a bit long. Let me know if you need any additional information from us or if you ant us to try anything else. Also, let me know if I am doing anything wrong with my code.

#include <iostream>
#include <vector>
#include <string>
#include "netcdf.h"

#define FILENAME "g2991833_repro.nc"
#define NDIM 2
#define INPDIM_X 4
#define INPDIM_Y 4

using namespace std;

void checkErrorCode(int status, const char* message){
    if (status != NC_NOERR){
        std::cout << "Error code: " << status << " from " << message << std::endl;
        std::cout << nc_strerror(status) << std::endl << std::endl;
    }
}

//-------------------------------------------------------
// Create double data of size 4x4. Create an NC_DOUBLE variable called 'doubleVar'.
// Write the 4x4 double data to 'doubleVar', with start as [0,2]. 
void writeDoubleVar(int ncid, int* dimids)
{
    int var_id, retval;

    // Vectors for start, count and stride
    vector <size_t> nc_start;
    vector <size_t> nc_count;
    size_t* p_nc_start;
    size_t* p_nc_count;

    // Create the double data
    double dbldata[INPDIM_X][INPDIM_Y];
    for (int i=0; i< INPDIM_X; i++)
    {
        for (int j=0; j<INPDIM_Y; j++)
        {
            dbldata[i][j] = i+j;
        }
    }

    //for (int i=0; i< INPDIM_X; i++)
    //{
    //  for (int j=0; j<INPDIM_Y; j++)
    //  {
    //      std::cout << dbldata[i][j] << " ";
    //  }
    //  std::cout << std::endl;
    //}

    retval = nc_def_var(ncid, "doubleVar", NC_DOUBLE, NDIM, dimids, &var_id);
    checkErrorCode(retval, "nc_def_var");

    // Fill the start, count and stride
    nc_start.push_back(0);
    nc_start.push_back(2);

    nc_count.push_back(INPDIM_Y);
    nc_count.push_back(INPDIM_X);

    p_nc_start = &nc_start[0];
    p_nc_count = &nc_count[0];

    std::cout << "Start " << "Count " << "Stride " << std::endl;
    for (int ii = 0; ii < static_cast<int>(nc_start.size()); ii++)
    {
        std::cout << nc_start[ii] << nc_count[ii] << std::endl;
    }

    retval = nc_put_vars_double(ncid, var_id, p_nc_start, p_nc_count, NULL, &dbldata[0][0]);
    checkErrorCode(retval, "nc_put_vars_double");

}

//-------------------------------------------------------
// Create int32 data of size 4x4. Create an NC_INT variable called 'int32Var'.
// Write the 4x4 double data to 'int32Var', with start as [1,1]. 
void writeIntVar(int ncid, int* dimids)
{
    int var_id2, retval;

    // Vectors for start, count and stride
    vector <size_t> nc_start;
    vector <size_t> nc_count;
    size_t* p_nc_start;
    size_t* p_nc_count;

    // Create the 4x4 int32 data to be written
    int32_t int32data[INPDIM_X][INPDIM_Y];
    for (int i=0; i< INPDIM_X; i++)
    {
        for (int j=0; j<INPDIM_Y; j++)
        {
            int32data[i][j] = i+j;
        }
    }

    retval = nc_def_var(ncid, "int32Var", NC_INT, NDIM, dimids, &var_id2);
    checkErrorCode(retval, "nc_def_var");

    // Fill the start, count and stride
    nc_start.push_back(1);
    nc_start.push_back(1);

    nc_count.push_back(INPDIM_Y);
    nc_count.push_back(INPDIM_X);

    p_nc_start = &nc_start[0];
    p_nc_count = &nc_count[0];

    //std::cout << "Start " << "Count " << "Stride " << std::endl;
    //for (int ii = 0; ii < static_cast<int>(nc_start.size()); ii++)
    //{
    //    std::cout << nc_start[ii] << nc_count[ii] << std::endl;
    //}

    retval = nc_put_vars_int(ncid, var_id2, p_nc_start, p_nc_count, NULL, &int32data[0][0]);
    checkErrorCode(retval, "nc_put_vars_int");

    //--------Print the int32 variable contents -------//
    for (int i=0; i< INPDIM_X; i++)
    {
        for (int j=0; j<INPDIM_Y; j++)
        {
            std::cout << int32data[i][j] << " ";
        }
        std::cout << std::endl;
    }
    //--------End Printing----------//
}

int main()
{

    // Open file
    int ncid, var_id2;
    int retval;
    int xdim_id, ydim_id;
    int dimids[NDIM];

    retval = nc_create(FILENAME, NC_NETCDF4, &ncid);
    checkErrorCode(retval, "nc_create");

    // Create two dimensions: dim1 is called 'y_dim' and length is 7.
    // dim2 is called 'x_dim' and length is NC_UNLIMITED.
    retval = nc_def_dim(ncid, "y_dim", 7, &ydim_id);
    checkErrorCode(retval, "nc_def_dim");

    retval = nc_def_dim(ncid, "x_dim", NC_UNLIMITED, &xdim_id);
    checkErrorCode(retval, "nc_def_dim");

    dimids[0] = ydim_id;
    dimids[1] = xdim_id;

    //------- Write Double Variable ---------//
    // If this is commented out, the int variable read later is different
    writeDoubleVar(ncid, dimids);

    //------- Write NC_INT Variable ---------//
    writeIntVar(ncid, dimids);

    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close");

    // ------------- Reopen the file and read the int32 variable -------//
    // Reopen the file and read the 'int32Var' variable where 4x4 data 
    // is written with start as [1,1]. However, on printing, you will see that the data 
    // has a weird start, and a weird gap between the columns.

    int ncid_read, varid_read;
    int xtype, ndims, dimids_read[NDIM];

    retval = nc_open(FILENAME, NC_NOWRITE, &ncid_read);
    checkErrorCode(retval, "nc_open");

    retval = nc_inq_varid(ncid_read, "int32Var", &varid_read);
    checkErrorCode(retval, "nc_inq_varid");

    retval = nc_inq_var(ncid_read,varid_read,NULL,&xtype,&ndims,dimids_read,NULL);
    checkErrorCode(retval, "nc_inq_var");

    std::vector<size_t>  nc_count_coord(ndims);
    // Printing the dim lengths, which should be 7x5 but instead print as 7x6
    for ( int j = 0; j < ndims; ++j ) {
        retval = nc_inq_dimlen ( ncid_read, dimids_read[j], &(nc_count_coord[j]) );
        checkErrorCode(retval, "nc_inq_dimlen");        
    }
    std::cout << nc_count_coord[0] << " " << nc_count_coord[1] << std::endl;

    // Get the int variable
    int outData[nc_count_coord[0]][nc_count_coord[1]];
    retval = nc_get_var_int ( ncid_read, varid_read, &outData[0][0] );
    checkErrorCode(retval, "nc_get_var_int");

    retval = nc_close(ncid_read);
    checkErrorCode(retval, "nc_close");

    for (int i=0; i< nc_count_coord[1]; i++)
    {
        for (int j=0; j<nc_count_coord[0]; j++)
        {
            std::cout << outData[i][j] << " ";
        }
        std::cout << std::endl;
    }

    // -----------End reading ----------//

    return 0;

}

DennisHeimbigner commented 1 year ago

You need to understand that the length of a named UNLIMITED (x_dim in this case) is the max over all variables that use that particular dimension. So:

initially |x_dim| == 0
after double write |x_dim| == 6
before int write, |x_dim| == 6, so writing less will not shorten |x_dim| to 5. If you do not write the double variable, then you see this:
initially |x_dim| == 0
after int write |x_dim| == 5

So two things to remember:

the size of an unlimited is the max over all writes
the size of an unlimited never gets smaller.

You can get the effect you want by using two different named dimensions:

x_dim_dbl: UNLIMITED
x_dim_int: UNLIMITED
use x_dim_dbl when defining the dbl var
use x_dim_int when defining the int var

abhibaruah commented 1 year ago

Thanks Dennis. This makes sense. But I am still baffled by the int variable that is read back (Step 5 in my original problem statement), where the start value I provided is not being honored. x d d d d x x x d d d x x x x d d x x x x x d x d x x x x x d d x x x x d d d x x x

7x6 is understandable, but the ordering of the data elements is jumbled. This is probably because the [1,1] element of the 7x5 array which I WANT to write (step 4 above) has the same linear index as the [1,0] index of the 7x6 array above. Essentially the 7x5 array is being written to a 7x6 array because |x_dim| =6 Is that correct?

I feel like this behavior can mislead users as the actual data written is different from what the user expected to write. Can anything be done to improve the behavior? (for example, resizing the named UNLIMITED to the length of the data being written, or informing the user somehow that the start value/dimensions they provided will not be honored)

abhibaruah commented 1 year ago

Hello @DennisHeimbigner and @WardF , I was wondering if you got a chance to look at my follow up question. If you confirm this as the expected behavior and there wont be any behavior changes to this, we will probably go ahead and record the same somewhere in our docs.

DennisHeimbigner commented 1 year ago

Sorry, I apparently missed that follow up question.

Can anything be done to improve the behavior? (for example, resizing the named UNLIMITED to the length of the data being written, or informing the user somehow that the start value/dimensions they provided will not be honored)

I think it is being honored, but in the context of 7x6 instead of 7x5.

abhibaruah commented 1 year ago

Hello @DennisHeimbigner, I had a follow up questions regarding this. I understand that the UNLIMITED dimension is the max over all the dimensions which use it, which is why in my example above, it is set to 6.

However, there is some discrepancy while computing the linear indices of the second variable. For the second variable, xdim (the unlimited dim) is 6, the ydim is 7 and the start is [1,1]. So, this start should translate to a linear index of 7 [1*xdim + 1] and should write the data as this x x x x x x x x d d d d x x x d d d d x x x d d d d x x x d d d d x x x x x x x x x

Instead, the linear index of the start is calculated based on xdim as 5. x d d d d x x x d d d x x x x d d x x x x x d x d x x x x x d d x x x x d d d x x x Linear index = 1*5+1

So, while the dimension of the second variable considers xdim as 6, which is the max of the UNLIMITED dimensions, while writing the data, the linear dimension is calculated based on xdim as 5. If the xdim for the second var is 6, the linear indexing should also use 6 as the value of xdim.

Let me know if my understanding is correct or if I am missing something here.

DennisHeimbigner commented 1 year ago

I think you are confusing the x and y dims. In your code, ydim is first. so your output should show 7 rows of 6 columns each instead of (as you have above) 6 rows of 7 columns.

abhibaruah commented 1 year ago

Yes, you are correct. I apologize for this confusion between column major and row major. Nevertheless, the bug I am talking about still stands. For the second variable, xdim (the unlimited dim) is 6, the ydim is 7 and the start is [1,1].

x x x x x x x d d d d x x d d d d x x d d d d x x x x x x x x x x x x x x x x x x x

Instead the data written is: x x x x x x d d d d x d d d d x d d d d x d d d d x x x x x x x x x x x x x x x x x

If it makes clearer, the 4x4 matrix I am trying to write is this 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6

And it gets written as (F being the fill value): F F F F F F 0 1 2 3 F 1 2 3 4 F 2 3 4 5 F 3 4 5 6 F F F F F F F F F F F
F F F F F F

instead of being written as:

F F F F F F F 0 1 2 3 F F 1 2 3 4 F F 2 3 4 5 F F 3 4 5 6 F F F F F F F F F F F F F

cwannaz commented 1 year ago

In other words (oversimplifying a little), if I understand well, when writing data the computation of linear indices for the insertion should be based on max(unlimited dim size, start+data block size) along all dimensions, and it looks like it is currently based on start + data block size.

DennisHeimbigner commented 1 year ago

I did some mods to your program to clarify things for myself. I did one thing. If you print out the created file using ncdump, it definitely works correctly.

abhibaruah commented 1 year ago

Correct. I just checked and see that ncdump prints the data correctly. Is the issue with 'nc_get_var_int' or the other 'nc_getvar*' functions?

DennisHeimbigner commented 1 year ago

Not sure. I think I need to investigate the semantics of unlimited in more detail.

abhibaruah commented 1 year ago

Hello Dennis, I was wondering if you had any update regarding this bug. Are there any plans to address this in upcoming NetCDF releases?

abhibaruah commented 1 month ago

Hello Dennis, Wanted to check with you again to see if there are any plans to address this or if this is to be closed as a no-op.

DennisHeimbigner commented 1 month ago

Sorry, we can never seem to find the time to fix this. But it is still in our to-do list.

Unidata / netcdf-c

Writing an int variable with start after a double variable with start jumbles the data written #2694