Unidata / thredds

THREDDS Data Server v4.6
https://www.unidata.ucar.edu/software/tds/v4.6/index.html
265 stars 179 forks source link

NetCDF-Java applies netCDF formula for packed data (scale/offset) to HDF-EOS data #269

Open ethanrd opened 9 years ago

ethanrd commented 9 years ago

While HDF-EOS and netCDF both use scale_factor and add_offset attributes to describe how data has been packed, they do not use the same (un)packing formula. (Actually, it is not clear that all HDF and HDF-EOS files use the same (un)packing formuals (see Note 2 below).)

The formula to unpack netCDF packed data (as described in the "NetCDF Best Practices" document) is

unpacked = scale_factor*packed + add_offset

Where as the formula to unpack HDF-EOS packed data (see Note 2 below) is

unpacked = scale_factor*(packed - add_offset)

An example dataset that illustrates this problem is here

ftp://ladsweb.nascom.nasa.gov/allData/51/MOD08_D3/2001/002/MOD08_D3.A2001002.051.2010286150655.hdf

The problem can easily be seen by looking at the variable Cloud_Top_Temperature_Day_Maximum. Using the ToolsUI Grid Viewer one can quickly see values around -14900 degrees Kelvin.

short Cloud_Top_Temperature_Day_Maximum(YDim=180, XDim=360); :units = "Degrees Kelvin"; :scale_factor = 0.01; // double :add_offset = -15000.0; // double :valid_range = 0S, 20000S; // short :_FillValue = -9999S; // short

Using 11225 as the packed value (from index [30,0]) and the scale/offset values above:

Interestingly, when accessed via OPeNDAP from a Hyrax server the scale/offset values are adjusted for the netCDF formula:

http://ladsweb.nascom.nasa.gov/opendap/allData/51/MOD08_D3/2001/002/MOD08_D3.A2001002.051.2010286150655.hdf

Cloud_Top_Temperature_Day_Maximum { Int16 valid_range 0, 20000; Int16 _FillValue -9999; String units "Degrees Kelvin"; Float64 scale_factor 0.010000000000000000; Float64 add_offset 150.00000000000000;

With these values and the packed value used above:

Thanks to Chris Lynnes, who pointed out this problem.

Note 2

I have not found a definitive statement describing the HDF-EOS formula for packed data. The NCAR NCL page on HDF mentions both packing formulas (see the "NCL General Comments" section). The NCO documentation mentions that "[m]ost files originally written in HDF format use the HDF packing/unpacking algorithm" (and references some HDF5 documentation on packed data) but NCO defaults to netCDF (un)packing.

JohnLCaron commented 9 years ago

we can correct this in the HDF-EOS code if we are sure what to do.

On Fri, Oct 30, 2015 at 5:04 PM, Ethan Davis notifications@github.com wrote:

While HDF-EOS and netCDF both use scale_factor and add_offset attributes to describe how data has been packed, they do not use the same (un)packing formula. (Actually, it is not clear that all HDF and HDF-EOS files use the same (un)packing formuals (see Note 2 http://note-2 below).)

The formula to unpack netCDF packed data (as described in the "NetCDF Best Practices" document https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html#Packed%20Data%20Values) is

unpacked = scale_factor*packed + add_offset

Where as the formula to unpack HDF-EOS packed data (see Note 2 http://note-2 below) is

unpacked = scale_factor*(packed - add_offset)

An example dataset that illustrates this problem is here

ftp://ladsweb.nascom.nasa.gov/allData/51/MOD08_D3/2001/002/MOD08_D3.A2001002.051.2010286150655.hdf

The problem can easily be seen by looking at the variable Cloud_Top_Temperature_Day_Maximum. Using the ToolsUI Grid Viewer one can quickly see values around -14900 degrees Kelvin.

short Cloud_Top_Temperature_Day_Maximum(YDim=180, XDim=360); :units = "Degrees Kelvin"; :_scalefactor = 0.01; // double :_addoffset = -15000.0; // double :valid_range = 0S, 20000S; // short :_FillValue = -9999S; // short

Using 11225 as the packed value (from index [30,0]) and the scale/offset values above:

  • the netCDF formula gives -14887.75
  • the HDF-EOS formula gives 262.25

Interestingly, when accessed via OPeNDAP from a Hyrax server the scale/offset values are adjusted for the netCDF formula:

http://ladsweb.nascom.nasa.gov/opendap/allData/51/MOD08_D3/2001/002/MOD08_D3.A2001002.051.2010286150655.hdf

Cloud_Top_Temperature_Day_Maximum { Int16 valid_range 0, 20000; Int16 _FillValue -9999; String units "Degrees Kelvin"; Float64 _scalefactor 0.010000000000000000; Float64 _addoffset 150.00000000000000;

With these values and the packed value used above:

  • the netCDF formula gives 262.25
  • the HDF-EOS formula gives 110.75

Note 1

Thanks to Chris Lynnes, who pointed out this problem. Note 2

I have not found a definitive statement describing the HDF-EOS formula for packed data. The NCAR NCL page on HDF https://www.ncl.ucar.edu/Applications/HDF.shtml mentions both packing formulas (see the "NCL General Comments" section). The NCO documentation http://nco.sourceforge.net/nco.html#hdf_upk mentions that "[m]ost files originally written in HDF format use the HDF packing/unpacking algorithm" (and references some HDF5 documentation https://www.hdfgroup.org/HDF5/doc/UG/UG_frame10Datasets.html on packed data) but NCO defaults to netCDF (un)packing.

— Reply to this email directly or view it on GitHub https://github.com/Unidata/thredds/issues/269.

ethanrd commented 9 years ago

That's what I figured. Its the "being sure what to do" that may be the problem.

I would think if we can tell that a file is HDF (not nc4) and especially HDF-EOS, we should use the HDF-EOS formula. However, there are at least two problems. The first problem comes with remote access. In particular, Hyrax transforms scale/offset metadata while TDS does not. The second is that there are efforts to "harmonize" these two standards. That ones probably not as big of a deal. Three is that it doesn't sound like this is necessarily handled consistently in HDF-land.

So, for now, maybe defaulting to netCDF scale/offset handling but allowing the user to specify if they want HDF scale/offset handling.