Open ethanrd opened 9 years ago
we can correct this in the HDF-EOS code if we are sure what to do.
On Fri, Oct 30, 2015 at 5:04 PM, Ethan Davis notifications@github.com wrote:
While HDF-EOS and netCDF both use scale_factor and add_offset attributes to describe how data has been packed, they do not use the same (un)packing formula. (Actually, it is not clear that all HDF and HDF-EOS files use the same (un)packing formuals (see Note 2 http://note-2 below).)
The formula to unpack netCDF packed data (as described in the "NetCDF Best Practices" document https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html#Packed%20Data%20Values) is
unpacked = scale_factor*packed + add_offset
Where as the formula to unpack HDF-EOS packed data (see Note 2 http://note-2 below) is
unpacked = scale_factor*(packed - add_offset)
An example dataset that illustrates this problem is here
ftp://ladsweb.nascom.nasa.gov/allData/51/MOD08_D3/2001/002/MOD08_D3.A2001002.051.2010286150655.hdf
The problem can easily be seen by looking at the variable Cloud_Top_Temperature_Day_Maximum. Using the ToolsUI Grid Viewer one can quickly see values around -14900 degrees Kelvin.
short Cloud_Top_Temperature_Day_Maximum(YDim=180, XDim=360); :units = "Degrees Kelvin"; :_scalefactor = 0.01; // double :_addoffset = -15000.0; // double :valid_range = 0S, 20000S; // short :_FillValue = -9999S; // short
Using 11225 as the packed value (from index [30,0]) and the scale/offset values above:
- the netCDF formula gives -14887.75
- the HDF-EOS formula gives 262.25
Interestingly, when accessed via OPeNDAP from a Hyrax server the scale/offset values are adjusted for the netCDF formula:
Cloud_Top_Temperature_Day_Maximum { Int16 valid_range 0, 20000; Int16 _FillValue -9999; String units "Degrees Kelvin"; Float64 _scalefactor 0.010000000000000000; Float64 _addoffset 150.00000000000000;
With these values and the packed value used above:
- the netCDF formula gives 262.25
- the HDF-EOS formula gives 110.75
Note 1
Thanks to Chris Lynnes, who pointed out this problem. Note 2
I have not found a definitive statement describing the HDF-EOS formula for packed data. The NCAR NCL page on HDF https://www.ncl.ucar.edu/Applications/HDF.shtml mentions both packing formulas (see the "NCL General Comments" section). The NCO documentation http://nco.sourceforge.net/nco.html#hdf_upk mentions that "[m]ost files originally written in HDF format use the HDF packing/unpacking algorithm" (and references some HDF5 documentation https://www.hdfgroup.org/HDF5/doc/UG/UG_frame10Datasets.html on packed data) but NCO defaults to netCDF (un)packing.
— Reply to this email directly or view it on GitHub https://github.com/Unidata/thredds/issues/269.
That's what I figured. Its the "being sure what to do" that may be the problem.
I would think if we can tell that a file is HDF (not nc4) and especially HDF-EOS, we should use the HDF-EOS formula. However, there are at least two problems. The first problem comes with remote access. In particular, Hyrax transforms scale/offset metadata while TDS does not. The second is that there are efforts to "harmonize" these two standards. That ones probably not as big of a deal. Three is that it doesn't sound like this is necessarily handled consistently in HDF-land.
So, for now, maybe defaulting to netCDF scale/offset handling but allowing the user to specify if they want HDF scale/offset handling.
While HDF-EOS and netCDF both use
scale_factor
andadd_offset
attributes to describe how data has been packed, they do not use the same (un)packing formula. (Actually, it is not clear that all HDF and HDF-EOS files use the same (un)packing formuals (see Note 2 below).)The formula to unpack netCDF packed data (as described in the "NetCDF Best Practices" document) is
Where as the formula to unpack HDF-EOS packed data (see Note 2 below) is
An example dataset that illustrates this problem is here
ftp://ladsweb.nascom.nasa.gov/allData/51/MOD08_D3/2001/002/MOD08_D3.A2001002.051.2010286150655.hdf
The problem can easily be seen by looking at the variable
Cloud_Top_Temperature_Day_Maximum
. Using the ToolsUI Grid Viewer one can quickly see values around -14900 degrees Kelvin.Using
11225
as the packed value (from index [30,0]) and the scale/offset values above:-14887.75
262.25
Interestingly, when accessed via OPeNDAP from a Hyrax server the scale/offset values are adjusted for the netCDF formula:
http://ladsweb.nascom.nasa.gov/opendap/allData/51/MOD08_D3/2001/002/MOD08_D3.A2001002.051.2010286150655.hdf
With these values and the packed value used above:
262.25
110.75
Note 1
Thanks to Chris Lynnes, who pointed out this problem.
Note 2
I have not found a definitive statement describing the HDF-EOS formula for packed data. The NCAR NCL page on HDF mentions both packing formulas (see the "NCL General Comments" section). The NCO documentation mentions that "[m]ost files originally written in HDF format use the HDF packing/unpacking algorithm" (and references some HDF5 documentation on packed data) but NCO defaults to netCDF (un)packing.