Open anewman89 opened 8 years ago
Sounds like we have two issues here:
develop
for. big_memory
mode or the original
mode and see if that helps. big_memory
will be the fastest but, as you may glean from its name, it uses the most memory. This mode reads and writes each file only once.Hi Joe, I've pushed to my fork. It looks like I edited both the 3-d and 4-d output for the nc_add_data_standard
function. I can go ahead and issue the pull request.
On point 2: Right, the standard
mode would write after each chunk is read in. Does it work like this:
netcdf_put_vara_*
would be used for each variable write.I would think the total data writes would still be roughly equal to the total data read... I was getting something on the order of 10x data being written than read.
I would think the total data writes would still be roughly equal to the total data read... I was getting something on the order of 10x data being written than read.
It probably depends how you chunk you dataset up.
Issue write commands to fill the portions of the grid as they are read in. Something like netcdf_putvara* would be used for each variable write.
yes, but the Python API doesn't use that syntax exactly.
I used the "standard" option and ran into an issue. If I tried to include soil moisture (4-d variable) in the configuration file I got the following error when the code tries to write to the netcdf files after it loads all the files in the current chunk:
I traced it back to line ~448:
is looking for something that is 2-dimensional while
is only 1 dimensional with a length set at the number of time steps going into the current netcdf file. If I removed soil moisture in the configuration file, I got this option to output properly, so it was an issue with 4-d variables.
I then made some modifications to the code and got it to work for 4-d variables. This is really my first halfway serious go with python, so my syntactical understanding is limited, lots of potential for me to have messed the fix up in some fashion.
I ran the code a bunch and it worked fine. It seemed a little slow, but there is lots of I/O both in and out so I didn't think much of it. Then I got an email from our supercomputer system administration folks stating that my code was performing an excessive amount of disk writes to the same location. They reported that the read rates were fine, but the output was many times the input. That makes me think I fixed the code in an improper fashion so the netcdf writes are occurring an excessive number of times...
The changes are in the function: nc_add_data_standard. What is the best way for me to post my "fixed" code?
Cheers, Andy