COSIMA / cice5

Clone of The Los Alamos sea ice model (CICE) with ACCESS drivers. See https://github.com/CICE-Consortium/CICE-svn-trunk/tree/cice-5.1.2
4 stars 13 forks source link

Diagnostic output and restarts not compressed #26

Closed nichannah closed 5 years ago

nichannah commented 5 years ago

Slack conversation:

Hi All. Just a quick query on what you all think we should do about CICE output. Right now I have a few problems with the CICE output — I dislike the single file per month, I don’t know why we need to have all CICE output stored in ice/OUTPUT/ rather than just ice/ and, finally, it is uncompressed! A quick test with 025deg output indicates that, for monthly output, CICE output is costing us 4 times the MOM output. By individually compressing each file, we could reduce the ice storage by a factor >5 and the total storage by a factor of 2.5. Obviously this is a no-brainer, and we should do it. At the same time we could also consider trimming down the number of files. I would like this from a user point of view, but maybe I am just old-fashioned. It would require us to have a postprocessing script to collate monthly files in (say) annual files. My quick tests today indicated it would only save a few %, and we would have to re-build the cookbook database once we change the file structure. Any thoughts on whether we should do this? Finally, should we automate compression and/or collation of CICE output within payu for future runs?

aidan [3:39 PM] Should look first to see how simple it might be to accomplish the compression part in CICE itself

andy [4:40 PM] OK, yes, let’s see what CICE can do. In the meantime, Aidan, do you have time to attempt a postprocessing script for us to trawl through existing cice files and do a straight compression? I can then test on some of our less important datasets before we set it going for real.

aidan [4:40 PM] I’ll add it to the list (and prioritise!) Can you say EXACTLY what you want done, preferably with an example directory and description of before and after

andy [4:55 PM] No worries. In my testing, I copied a CICE OUTPUT directory to /home/157/amh157/v45/amh157/temp. There I made a parallel directory OUTPUT_PROCESSED, and tested a few of the files with the following command: nccopy -d 5 -7 OUTPUT/iceh.2256-01.nc OUTPUT_PROCESSED/iceh.2256-01.nc Basically, I guess the best strategy is to nccopy every file like that and overwrite the old one??

nichannah commented 5 years ago

@aidanheerdegen do you have any thoughts on what would be good compression defaults to put into the CICE code? Things like deflate level and chunking?

aidanheerdegen commented 5 years ago

I set the default deflate level in mppnccombine to 5

https://github.com/mom-ocean/MOM5/blob/99168b44ab45f4f5b4fa2544a0c3f644f0afb666/src/postprocessing/mppnccombine/mppnccombine.c#L131

Make sure you also default to having shuffle on (gives a few % free compression for no noticeable overhead).

Chunking is problematic. If you don't do anything the library will choose something. Often it is not great, but whatcha gonna do? For nccompress I tried an algorithm that roughly mimicked the shape of the data given a total chunk size (in bytes)

https://github.com/aidanheerdegen/nccompress/blob/master/nc2nc#L87

At this point I'm not sure it is worth the hassle of doing this, as what is a good chunk size can vary so much depending on the access patterns. Ice data is mostly 2D (very thin in depth) and clustered around the top and bottom, so maybe it is sufficient to not chunk depth, and ensure the chunking in latitude isn't so large that you read in a lot of unnecessary data where there is no ice? So a less generalised algorithm where longitude isn't chunked either, and just chunk latitude so chunk sizes stay below something reasonable but not tiny (so 1-4Mb < chunk size < 20Mb ?). Might need to do a little bit of testing to tune that

aidanheerdegen commented 5 years ago

In the chunk_shape_nD routine I have a default chunkSize of 4K. I now think this is way too low.

nichannah commented 5 years ago

To do this I'll need to switch CICE to netcdf4, it's probably long overdue anyway.

aidanheerdegen commented 5 years ago

Should be API compatible? So just a flag at file creation time .. no?