Unidata / netcdf-cxx4

Official GitHub repository for netCDF-C++ libraries and utilities.
Other
124 stars 49 forks source link

Append data to existing NetCDF file #48

Closed Weiming-Hu closed 7 years ago

Weiming-Hu commented 7 years ago

Hello there,

This is not much of an issue, but more of a "how-to" question, because I failed miserably searching online for solutions. How to append new values to an existed variable in an existed NetCDF file?

Normally when we write the data to the file, we create an object with all the data and then call put once to put all the data to the file. The problem I'm trying to solve is that, because of the RAM limit, I can't read the entire data into RAM at once, which forces me to add the data to the file multiple times. How could I do that?

Thank you very much. And if I'm posting at a wrong place, please inform me the proper place to seek for answers.

WardF commented 7 years ago

When you say 'put', what function exactly are you calling? When you write you should be able to specify the start location of the data you want to write (along the dimension(s)) associated with the variable. You should be able to open the file and write the data by specifying the appropriate 'start' index.

Weiming-Hu commented 7 years ago

Thank you! I just did some reading, and I would like to clarify my question.

The function put I was talking about is this one. And yes, with this function, I understand that I can specify the start location and the count of the array that I want to write into the NetCDF file. My question is, can I specify the location in the NetCDF file where I would like to put the values? Or the function put just simply append the data to the end of the existed NcVar object?

Weiming-Hu commented 7 years ago

And I have several following up questions.

Thank you

WardF commented 7 years ago

start is an array of indices to write to along dimension; if it is a variable along a single dimension, and there are 10 values (0-9), you would specify a start of 10 to append.

You may not convert a limited dimension to an unlimited dimension variable in-place; you would need to write code which read from one file and wrote into a second, new file.

You would append data to a variable in the netCDF4 file, not the file itself. While this is a subtle difference, you cannot assume that the data written to the netCDF file will be appended to the end of the file. Also, I assume you mean 2 unlimited dimensions (and not files, correct)? If the variable is associated with 2 unlimited dimensions, you would specify a 2d array (each) of start position and count values; see my remark above regarding the start value and calculating an offset.

Weiming-Hu commented 7 years ago

Thank you for your reply. For my case, I'm only concerned about appending more values/data to an existed variable in a existed NetCDF file.

So as you mentioned that I can not assume that the data appended to the existed variable would be added to the end of the file, then how does this operation work? I'm trying to understand how the extra values get appended.

For example, if I would like to add all the values in data_to_append to the existed variable ncvar_obj, I would assume the following code:

// data_to_append has the dimension [10][10][10][10]
double *data_to_append = {some data};
vector<size_t> start = {0, 0, 0, 0};
vector<size_t> count = {10, 10, 10, 10};
ncvar_obj.put(start, count, data_to_append);

Will the code append to the ncvar_obj or simply rewrite the original data in the object?

Thank you!

WardF commented 7 years ago

The code would overwrite any data contained within the first 10 indices of the 4 dimensions; it does not append. start is an absolute index. So if you performed this operation once, the second set of values would use start = {10, 10, 10, 10}.

Weiming-Hu commented 7 years ago

So you are saying that the start vector indicates the start index of the existed NetCDF variable where the data (in this case, data_to_append) will be put, rather than the index of data_to_apppend. Correct?

WardF commented 7 years ago

That is correct.

Weiming-Hu commented 7 years ago

Thank you very much. It's much clearer for me right now.

So jut to follow up on that,

Weiming-Hu commented 7 years ago

In NetCDF4 format, the unlimited dimension doesn't have to be the last dimension as requested in the classic NetCDF format. Right?

If a NetCDF variable is associated with two Unlimited variables, why do I need to specify a 2d array for start and count?

WardF commented 7 years ago

Consider a variable that is associated with two unlimited dimensions. The dimensions are Lat and Lon. The variable is temperature. You would index temperature by a latitude and longitude coordinate, represented

temp(Lat,Lon)

If this is unclear, instead if Lat,Lon let us use x,y.

temp(x,y)

So if you were going to write values to temp, you would need to specify a 2d array of start indices, one for the x coordinate and one for the y coordinate. You would also need to specify the number of values along each dimension similarly, using a 2d array for count.

So you might have

start = {0,0}
count = {10,10}

This would mean you had 100 values (10 * 10) to be written in the points {0:10},{0:10}, starting at 0,0. Does this make sense?

Weiming-Hu commented 7 years ago

Is it correct that if x is the only one unlimited dimension, the start and count would be

start = {0, 0};
count = {10, 0};
Weiming-Hu commented 7 years ago

Thank you.