Open jypeter opened 7 years ago
@jypeter I used ncdump
on every file in sample_data
for cdms
and could not found a packed data array. This is really missing to our testbed and need to be added. I don't know if cdms write packed data, maybe @doutriaux1 can help.
@jypeter Can you provide me with one of your packed data file?
@dnadeau4 @jypeter there is the functionality to pack data in netcdf4, so effectively reduce the precision to short type, take a peek here
@durack1 So it takes the min/max to compute the scale/offset.
It seems to be the best practice from netcdf implementation.
@jypeter you just need to pass pack=True
in cdms2.write()
@dnadeau4 yes pack=True, but if i remember correctly this does not work well for extended dimension (like time
) if you do many write in a row, because the min/max/scale/offset obviously changes between writes.
@dnadeau4 after much looking around, I have found the following file that has packed data inside a nc4 compressed file! One of our PhD students had problems with it a few years ago... netcdf4_compressed_example.nc
@dnadeau4 and @durack1 thanks for pointing out the pack option and the matching code. The doc string for pack should probably be updated, because it says
pack :: (False/True/numpy/numpy.int8/numpy.int16/numpy.int32/numpy.int64) pack the data to save up space
It should probably mention the http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#packed-data link and @doutriaux1 warning about multiple writes. And it should probably only have the False/True option, because, unless I'm mistaken, the source code only uses
if pack:
This is probably low priority, because we don't run across packed data very often, but at least the issue is listed
I have just remembered that I have sometimes had problems with packed data. I thought I had an old issue about that somewhere, but I have not found it on github. On the other hand, I have found https://github.com/UV-CDAT/uvcdat/issues/420 and I wonder where/if this writePacked function is available
If that's not already the case, it would be nice to document somewhere how cdms2 handles packed data. And document this in a way I can easily find the information next time I need it
I have googled _cdms addoffset (and _cdms scalefactor) and it brings you to CHAPTER 6 Climate Data Markup Language (CDML), which I'm not sure is the best answer... In a way, it's even worse if I google _cdms2 addoffset because I don't even find the Chapter 6 above! The fact that using cdms or cdms2 in a search string does not return the same results may also be a problem...
I mostly use CMIPn data that does not use packing, so I don't know if this kind of data is common. But this is documented in netcdf4-python (search _scalefactor and _addoffset)