cchdo / hydro

The big ol CCHDO netCDF-CF project
https://hydro.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

_FillValue and Flags #12

Closed DocOtak closed 3 years ago

DocOtak commented 4 years ago

The following came up:

Why are there NaN's? should it be WOCE 9? (bottle_salinity_qc:_FillValue = 9b) Does the _FillValue=9 conflict with a flag=9? There is a difference between a FillValue and a not_sampled, and it is important in netcdf with multiple profiles that are not all the same length.

Right now we set the _FillValue to 9, which in woce has the meaning of various ways of saying "empty"

I decided to look for some prior art:

ARGO: QC values are char data type and the argo manual says their fill value is an empty string, so no real information here.

OceanSites: Explicitly sets the 9 flag as the _FillValue, and defines 9 in a table to mean "missing value".

ODV (specifically its flag mappings): WOCE Flag 9 is mapped to BODC flag N (null), IODE flag 9 (missing value), and QARTOD 9 (missing value) as examples. In the example QARTOD netcdf files I could find, they do this same flag 9 as the _FillValue as we are currently doing. The IOOS glider file I could find uses -128 as the fill value but give 9 as the "missing_value" in the flag definitions (not netcdf attributes). I could not find any files which had any -128 or 9 values in them.

SeaDataNet: In SeaDataNet, you must set the _FillValue to the same thing as your not sampled/missing data value. For whatever reason, they use ASCII code points, so in a SeaDataNet file containing a flag 9 would actually have the byte 57.

Basically, it seems like the way we are currently doing this is the most common way of dealing with this technical + science problem. It is explicitly required by SeaDataNet which probably makes it a requirement for all Euro oceanographic data...

DocOtak commented 3 years ago

Files are "live" with no changes to how we were handling _FillValue. Having separate _FillValue and missing_value attributes is so uncommon that it's usually not handled well (e.g. xarray used to throw exceptions when this happened).

The way we are doing things seems to be most compatible with SeaDataNet (i.e. all of Europe) is doing things.