cf-json / cf-json.github.io

Apache License 2.0
0 stars 2 forks source link

multiple distinct missing values, use ancillary_variables and status_flag? #17

Open aportagain opened 4 years ago

aportagain commented 4 years ago

Converting from a binary format, @gregchalmers would like to "tell the users why a value error was returned like _FillValue or a masked value" (https://github.com/metocean/interp-cxx/issues/13). How can we best do this in CF-JSON (possibly including the round-trip issue, if that's something we can or want to ensure for this case)?

aportagain commented 4 years ago

@gregchalmers and I had a chat about this, and we think that for the CF-JSON representation, storing the multiple distinct meanings of missing values in a separate variable is the way to go. In that case (and excluding the round-trip issue for now), I think the CF conventions around ancillary data and flags might already / still be sufficient...

The data variable (e.g., hmo) can use the ancillary_variables attribute (http://cfconventions.org/cf-conventions/cf-conventions.html#ancillary-data) to reference the status flag variable (e.g., hmo_status_flag), which will have the standard_name attribute status_flag. This is in line with the CF conventions wording to use the ancillary_variables attribute "when one data variable provides metadata about the individual values of another data variable".

The status flag variables (hmo_status_flag), for the case of mutually exclusive values, can then use the flag_values and flag_meanings attributes (http://cfconventions.org/cf-conventions/cf-conventions.html#flags), where the flag_meanings attribute "is a string whose value is a blank separated list of descriptive words or phrases, one for each flag value".

So the way I understand it at the moment, this would look something like this:

{
  "dimensions": { "lat": 4 },
  "attributes": {},
  "variables": {
    "lat": {
      "attributes": {"standard_name": "latitude"},
      "shape": [ "lat" ],
      "data": [ 1, 2, 3, 4 ]
    },
    "hmo": {
      "attributes": {
        "standard_name": "sea_surface_wave_significant_height",
        "ancillary_variables": "hmo_status_flag"
        },
      "shape": [ "lat" ],
      "data": [ 1.1, null, null, 4.4 ]
    },
    "hmo_status_flag": {
      "attributes": {
        "standard_name": "status_flag",
        "flag_values": "[0, 1, 2]",
        "flag_meanings": "all_good missing_data masked_land"
      },
      "shape": [ "lat" ],
      "data": [ 0, 1, 2, 0 ]
    }
  }
}
aportagain commented 4 years ago

@gregchalmers , could you have a look at the example above and let me know if you think that would serve the purpose? I'd still need to double-check a few things, but if in principle this approach looks good, I'm fairly optimistic we wouldn't have to add anything to the actual CF-JSON spec, just comment and point people towards the relevant existing bits in the CF conventions.