Can vector data be recognized as such? / parameter groups

letmaik commented 9 years ago

For wind direction + speed the obvious visualization is arrows of a certain length. Do we need anything else in the Parameter objects to be able to recognize such thing? As the speed and direction are two different parameters, the minimum is probably to link both together somehow. Then, from the observedProperty you could infer that this is wind speed and direction and that you could use arrows for visualization. So, the question is, how is linking done? Just referencing other Parameters from one of them? Or have a virtual parameter that has "real" parameters as children?

jonblower commented 9 years ago

This is something that Guy and I have discussed at length in EDAL. Of course, in a NetCDF file all variables are "peers" with no grouping structure, so EDAL examines the standard names to look for vector components (e.g. {northward,eastward}_sea_water_velocity).

We should discuss this, but my immediate feeling is to avoid "virtual" parameters and use some properties to indicate that parameters are somehow linked. A similar approach can be used to show uncertain variables (e.g. mean + stdev).

letmaik commented 9 years ago

We definitely need something else than just the standard name, since a coverage can have multiple of such vector combinations. Also it needs to be simple.

So something like this is out of question?

  "parameters" : {
    "VEL": {
      "id" : "http://.../datasets/1/params/VEL",
      "type" : "CompositeParameter",
      "description" : "Wind velocity",
      "observedProperty" : {
        "id" : "http://foo/?",
        "label" : {
          "en": "Do composite parameters have an observedProperty?"
        }
      },
      "components": ["U","V"]
    },
    "U": {..},
    "V": {..}
  }

jonblower commented 9 years ago

Yes, that looks like a reasonable start. So the rules would be that such a parameter (which would usually be a vector/tensor quantity I guess) would have a "type" of "CompositeParameter" and a "components" field?

The CF standard names are a bit awkward in that they conflate the concept (wind velocity) with the component (i.e. direction). But I think EDAL can detect that and emit the individual components as related fields.

letmaik commented 9 years ago

Yes, although "components" is problematic for JSON-LD again, since you cannot use aliases as triple-object (except with namespace). You would have to do "components": ["http://.../datasets/1/params/U","http://.../datasets/1/params/V"]. Which.... is bad. The only workaround I know is to use a data-cube-like encoding (as we use for "ranges"):

  "parameters" : {
    "VEL": {
      "id" : "http://.../datasets/1/params/VEL",
      "type" : "CompositeParameter",
      "description" : "Wind velocity",
      "observedProperty" : {
        "id" : "http://foo/?",
        "label" : {
          "en": "Do composite parameters have an observedProperty?"
        }
      },
      "components": {
        "type": "ComponentSet",
        "U": {
          "type": "ComponentDetails",
          "label": "Speed component of velocity"
        },
        "V": {
          "type": "ComponentDetails",
          "label": "Direction component of velocity"
        }
      }
    },
    "U": {..},
    "V": {..}
  }

So the "ComponentDetails"-type object would contain any information about the actual association, not the parameter itself. Can you think of any such information that might be useful in any scenario. If you can, then that would at least be a reason to use this more verbose structure.

jonblower commented 9 years ago

I'm not sure of the answer to this, but similar questions have been asked in WMS, NetCDF and other areas. The choices seem to be:

Organise parameters hierarchically (so a parameter can contain other parameters), or
Organise parameters in a flat list of scalar parameters, plus a separate data structure that groups them, something like this:

"parameters" : {
  "U": {...}
  "V": {...}
},
"parameterGroups" {
 "VEL": {
    "type": "Velocity field",
    "components": { /* something like what you have above */ }
  }
}

EDAL does something like the latter, I think, and so does WMTS (which calls these groups "themes" and are really hints to user interfaces). But there are some tricky questions:

Where does the observedProperty sit? On scalar fields or the group (VEL)? Or both?
Similarly, where do units sit?
Should the scalar fields be fully described so that clients can ignore the parameterGroups for some purposes (probably yes)?

letmaik commented 9 years ago

Let's take magnetic field xyz measurements. You would have three parameters: x, y, z. Each has observedProperty "magnetic field strength", probably more specific with _north, _east, _down suffixes. The unit for all is nanoTesla. You could then group that into a magnetic field vector, although often the individual components are analysed and plotted individually, so grouping them this way may be a bit arbitrary.

I guess my concern is, what would happen if you include model data in the same coverage? How do you separate the different groups? And for that you would need either groups or something else which tells you that one of them is real and the other from a model. But the thing is, you need that information anyway for the individual components without looking at groups. Does that go into governance somehow? Questions over questions...

jonblower commented 9 years ago

I guess my first question would be - why would model and observed data be in the same coverage? They would be most likely to come from different data sources.

But, assuming this could happen, the wider question would be - what if there are multiple parameters in a coverage that are measuring the same thing? How do you group them? This could quite easily happen, for example in an ensemble simulation, where the same parameter is modelled different times under different conditions.

I suppose that, in this case, a client wouldn't know how to group them. The document would have to specify the grouping somehow, and the "soft" properties (e.g. label, description) would be used to distinguish them for humans (e.g. in a UI).

letmaik commented 9 years ago

So in summary, we need groups. I think I like your separated "parameterGroups" object more than my mixed approach. This also gives implementations a choice to ignore it easily if they want and still display the individual parameters (which then still map 1:1 to a range).

I would say that observedProperty is optional for groups, however they probably must have a label if observedProperty is missing. For example, you could have 4 parameters, 2x wind speed, 2x temperature, from two different model simulations. You want to group parameters of the two different simulations but there is no observedProperty which directly makes sense for that, without inventing something higher-level like "Weather". Still, for some cases it makes sense to have observedProperty, like wind velocity or magnetic field vector.

Proposal:

"parameters" : {
  "X": {...},
  "Y": {...},
  "Z": {...}
},
"parameterGroups": {
 "XYZ": {
    "type": "ParameterGroup",
    "observedProperty": {
      "label": {
        "en": "Magnetic field vector"
      }
    },
    "components": { /* as above */ }
  }
}

"parameters" : {
  "TMP_X": {...},
  "SAL_X": {...},
  "TMP_Y": {...},
  "SAL_Y": {...}
},
"parameterGroups": {
 "X": {
    "type": "ParameterGroup",
    "label": {
      "en": "Parameters of simulation X"
    },
    "components": { /* as above */ }
  },
 "Y": ...
}

Do we need key access (XYZ) for groups? I think currently we don't, but I'm sure at some point it may be useful to have it and not rely on array indices. Also better for talking about it.

jonblower commented 9 years ago

I would say that observedProperty is optional for groups, however they probably must have a label if observedProperty is missing.

Yes. observedProperty could still be useful for a group as it might encode the concept as a whole.

Do we need key access (XYZ) for groups?

I think yes, because it's more consistent with other things we're doing, and, as you say, it makes it easier to "talk about" the group.

jonblower commented 9 years ago

At the risk of wandering off the topic of vector quantities, another major use case for groups is uncertain variables. You might want to express the spread of values of a variable at each point by recording two fields, representing the mean and variance respectively. So you might have two scalar fields, "SST mean" and "SST variance", arranged into a group "SST".

I wonder what to do with observedProperties in this case. The group would have the observedProperty representing SST. But what about the scalar fields, which represent some statistic of SST? UncertML would be useful here.

(We went through a lot of this in GeoViQua, so I have a few more thoughts that are too lengthy for here...)

letmaik commented 9 years ago

Good use case! I would say that "SST mean" would still have SST as observedProperty and then some additional metadata saying that it is a mean, similar to what WaterML2 does. But for "SST variance" I'm not sure what the observedProperty would be, but probably not SST because this would be confusing if you don't know anything about the concepts. It nearly smells like observedProperty doesn't work in all cases so nicely.

jonblower commented 9 years ago

Maybe in the case of variance, observedProperty="SST", units="K^2", statisticalMeasure="variance".

letmaik commented 9 years ago

I think we should leave these details alone for now, we just don't know enough yet. The "complex properties" described in http://dx.doi.org/10.1080/17538947.2015.1033483 are another view on that but this is not widely accepted yet. I think this is a separate issue from the groups themselves.

letmaik commented 9 years ago

I would say for the objects inside "components" instead of ComponentDetails as "type" we could name it just Component. I'll integrate it into the spec and see if I come across any issues

letmaik commented 9 years ago

First problem I found: A parameter currently cannot directly have a label, just a description. I thought that the label of the observedProperty of the parameter would be enough for that purpose. But if you have two parameters of the same observedProperty but for different models, then it makes sense to differentiate the parameters also without having a group, which is where the label and description come in. So, my suggestion is to allow a parameter to have a label as well. In an application you could decide when to display the parameter label (if it exists) and when the observedProperty label (which is required), or even giving the parameter label precedence.

letmaik commented 9 years ago

Another issue you already mentioned is how to define a statistical parameter, especially the observedProperty of it. My suggestion:

{
  "type" : "Parameter",
  "observedProperty" : {
    "label" : {
      "en": "Sea surface temperature standard deviation"
    },
    "baseProperty": "http://vocab.nerc.ac.uk/standard_name/sea_surface_temperature/",
    "statisticalMeasure": "http://www.uncertml.org/statistics/standard-deviation"
  },
  "unit" : {
    "id" : "http://qudt.org/vocab/unit#DegreeCelsius",
    "label" : {
      "en": "degrees Celsius"
    },
    "symbol" : "°C"
  }
}

The important thing is the baseProperty and that the observedProperty doesn't have an ID (it could have one, but it would have to be specific, like "http://.../standard_name/sea_surface_temperature_standard_deviation" which obviously doesn't exist). I'm not sure if "baseProperty" is the best name though since it could suggest an inheritance which is not the case.

letmaik commented 9 years ago

I came to the conclusion that we really don't need any more detail than "components": ["SST_mean", "SST_stddev"]. It's just getting too ridiculous if you don't need a label to be forced to do something like that:

  "components": {
    "type": "ComponentSet",
    "SST_mean": {
      "type": "Component"
    },
    "SST_stddev": {
      "type": "Component"
    }
  }

And the cases where I thought a label is useful is actually not valid since this is better handled as the label of the parameter itself. It's also less confusing that way. The short form isn't compatible with JSON-LD 1.0 as I wrote earlier, but I think we can ignore that since the rest of the parameter group can be LD-queried, e.g. the observedProperty. And if future versions of JSON-LD include appropriate features, we can still adjust the context. <- It actually is possible with "@type": "@vocab".

letmaik commented 9 years ago

One more thing: We should check how this aligns to the CompositeObservableProperty of INSPIRE, and also whether our parameters align to the Complex Property Model (different issue though). I guess for vector quantities the composite works, but for more arbitrary groups it doesn't, like the uncertainties, or grouping of same-model parameters. By the way, what if you have two sets of model parameters each with a vector quantity? Not sure how grouping works then, could be a group of a group, but nested groups are probably not a super idea.

jonblower commented 9 years ago

This is difficult (but important) stuff. Clearly this whole area is not very settled and we're probably not going to be able to solve it all ourselves

By the way, what if you have two sets of model parameters each with a vector quantity?

I think that "groups of groups" sound like a bad idea. I would be tempted to say that in this case you would model the data as separate coverages.

By the way, on that point, there are cases where the different vector components (of the same field) have to be in separate coverages! This is because sometimes the U field is solved on a slightly different grid from the V field (e.g. Arakawa C-grid). Hence U and V have different domains...

letmaik commented 9 years ago

Closing for now. First version is in the spec. I added the following to the context:

"ParameterGroup": "covjson:ParameterGroup",
"components": { "@id": "covjson:component", "@type": "@vocab" },

The components definition makes "components": ["WIND_SPEED", "WIND_DIR"] valid if you define those two aliases in the context.

For other related issues, please open a new issue and link to this one.

covjson / specification

Can vector data be recognized as such? / parameter groups #22