Open abhibaruah opened 1 year ago
You need to understand that the length of a named UNLIMITED (x_dim in this case) is the max over all variables that use that particular dimension. So:
So two things to remember:
You can get the effect you want by using two different named dimensions:
Thanks Dennis. This makes sense. But I am still baffled by the int variable that is read back (Step 5 in my original problem statement), where the start value I provided is not being honored. x d d d d x x x d d d x x x x d d x x x x x d x d x x x x x d d x x x x d d d x x x
7x6 is understandable, but the ordering of the data elements is jumbled. This is probably because the [1,1] element of the 7x5 array which I WANT to write (step 4 above) has the same linear index as the [1,0] index of the 7x6 array above. Essentially the 7x5 array is being written to a 7x6 array because |x_dim| =6 Is that correct?
I feel like this behavior can mislead users as the actual data written is different from what the user expected to write. Can anything be done to improve the behavior? (for example, resizing the named UNLIMITED to the length of the data being written, or informing the user somehow that the start value/dimensions they provided will not be honored)
Hello @DennisHeimbigner and @WardF , I was wondering if you got a chance to look at my follow up question. If you confirm this as the expected behavior and there wont be any behavior changes to this, we will probably go ahead and record the same somewhere in our docs.
Sorry, I apparently missed that follow up question.
Can anything be done to improve the behavior? (for example, resizing the named UNLIMITED to the length of the data being written, or informing the user somehow that the start value/dimensions they provided will not be honored)
I think it is being honored, but in the context of 7x6 instead of 7x5.
Hello @DennisHeimbigner, I had a follow up questions regarding this. I understand that the UNLIMITED dimension is the max over all the dimensions which use it, which is why in my example above, it is set to 6.
However, there is some discrepancy while computing the linear indices of the second variable. For the second variable, xdim (the unlimited dim) is 6, the ydim is 7 and the start is [1,1]. So, this start should translate to a linear index of 7 [1*xdim + 1] and should write the data as this x x x x x x x x d d d d x x x d d d d x x x d d d d x x x d d d d x x x x x x x x x
Instead, the linear index of the start is calculated based on xdim as 5. x d d d d x x x d d d x x x x d d x x x x x d x d x x x x x d d x x x x d d d x x x Linear index = 1*5+1
So, while the dimension of the second variable considers xdim as 6, which is the max of the UNLIMITED dimensions, while writing the data, the linear dimension is calculated based on xdim as 5. If the xdim for the second var is 6, the linear indexing should also use 6 as the value of xdim.
Let me know if my understanding is correct or if I am missing something here.
I think you are confusing the x and y dims. In your code, ydim is first. so your output should show 7 rows of 6 columns each instead of (as you have above) 6 rows of 7 columns.
Yes, you are correct. I apologize for this confusion between column major and row major. Nevertheless, the bug I am talking about still stands. For the second variable, xdim (the unlimited dim) is 6, the ydim is 7 and the start is [1,1].
x x x x x x x d d d d x x d d d d x x d d d d x x x x x x x x x x x x x x x x x x x
Instead the data written is: x x x x x x d d d d x d d d d x d d d d x d d d d x x x x x x x x x x x x x x x x x
If it makes clearer, the 4x4 matrix I am trying to write is this 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6
And it gets written as (F being the fill value):
F F F F F F
0 1 2 3 F 1
2 3 4 F 2 3
4 5 F 3 4 5
6 F F F F F
F F F F F F
F F F F F F
instead of being written as:
F F F F F F F 0 1 2 3 F F 1 2 3 4 F F 2 3 4 5 F F 3 4 5 6 F F F F F F F F F F F F F
The linear index for the data to be written is calculated by considering the UNLIMITED |xdim| as 5, when it is actually 6. |ydim| ==7 for both the variables. So that is okay. |xdim| ==6 for double variable and it writes the data as expected. |xdim| ==6 for int variable based on what you said, but the data is written to file as if |xdim| ==5, but the size of the int variable considers |xdim| ==6.
In other words (oversimplifying a little), if I understand well, when writing data the computation of linear indices for the insertion should be based on max(unlimited dim size, start+data block size)
along all dimensions, and it looks like it is currently based on start + data block size
.
I did some mods to your program to clarify things for myself. I did one thing. If you print out the created file using ncdump, it definitely works correctly.
Correct. I just checked and see that ncdump prints the data correctly. Is the issue with 'nc_get_var_int' or the other 'nc_getvar*' functions?
Not sure. I think I need to investigate the semantics of unlimited in more detail.
Hello Dennis, I was wondering if you had any update regarding this bug. Are there any plans to address this in upcoming NetCDF releases?
Hello Dennis, Wanted to check with you again to see if there are any plans to address this or if this is to be closed as a no-op.
Sorry, we can never seem to find the time to fix this. But it is still in our to-do list.
NetCDF version: v4.9.1 HDF5 version: v1.10.8 OS: Linux
Hello, I found a possible bug with writing an NC_INT variable after writing an NC_DOUBLE variable (with start values other than the default). The NC_INT variable written is in an odd ordering, different from how I intended to write it. If I do not write the double variable before writing the int variable, the int data read back is different. Here is what I am doing.
(where ‘x’ represents the fill values and ‘d’ represents the data I want to write)
Create int32 data of size 4x4. Write it to a double variable with dimensions as above but with start as [1,1]. The int32 data I write here is as follows (7x5): x x x x x x x x d d d d x x x d d d d x x
x d d d d x x x d d d d x x
After this, I close the file and reopen it to read the int32 variable I created in Step 4 above. However, the variable I read is of dimensions (7x6) instead of (7x5) and the data is also weirdly organized. x d d d d x x x d d d x x x x d d x x x x x d x d x x x x x d d x x x x d d d x x x
This behavior is seen only when I write the double variable (in step 3) before the int32 variable. If I do not create the double variable, the int variable I read is the same as the one I write.
We initially found this bug in MATLAB, and I could reproduce it using a C standalone which links to netCDF v4.9.1. I understand that the reproduction steps are a bit long. Let me know if you need any additional information from us or if you ant us to try anything else. Also, let me know if I am doing anything wrong with my code.