faa-swim / sdcm

Issues tracking for Service Description Conceptural Model
1 stars 0 forks source link

Incorporate Data Set and Data Quality metadata in SDCM #2

Open wznira opened 2 years ago

wznira commented 2 years ago

OGC Testbed 13 and Testbed 14 explored ways to extend SWIM service descriptions to describe the information delivered in SWIM services and their quality attributes. For example,

We should consider incorporating these concepts in SDCM. A notional concept is shown using the diagram below - unnamed

swang-nira commented 2 years ago

Mark suggests to take a look at ontology approach for adding/extending those data set.

caroluri commented 2 years ago

(From Carol Uri email of 4/7/2020) I like it. It really covers data quality measures. There are a couple of misspellings: Geospatial "Extend" should be Geospatial "Extent"; "Verticle" Resolution should be "Vertical" Resolution.

The DCAT ontology concept of Data Service (https://www.w3.org/TR/vocab-dcat-2/#Class:Data_Service) is very like "Information Service" in our Service Category taxonomy. The Data_Service definition is "A collection of operations that provides access to one or more datasets or data processing functions." Different Information Service types could have different data quality needs too; e.g. "timeliness" of Weather data sets would probably be different from timeliness of World Feature data sets.

mkaplun commented 2 years ago

RE: comment by @wznira

1) According to the SDCM, Service Category taxonomy "classifies a service by the type of service provided or by some other technological or architectural solution," e.g., Aeronautical or Security. Geographical Extent is a location covered by the service. It can be numerical (latitude, longitude) or an identifier (e.g., "USA") that can be associated with some location. If this is the case, it is unclear how a Geospatial Extent can be a sub-class (i.e., specialization) of the Service Category taxonomy the way it is shown on the diagram?

2) Payload is "actual (business) data transferred by a message" [SDCM]. The Payload may consist of Data Entities (data elements) described in a Data Definition document. Dataset is an "identifiable collection of data" [OGC Glossary], which from the diagram, also looks like a collection of Data Entities. So both Payload and Dataset are structurally the same (collection of Data Entities, i.e., data elements). What is the difference between them? Why does the model need both?

3) Assuming that Dataset represents data produced by the described service, why should it be associated with the class Organization with the role of Publisher? For the described service, the organization responsible for producing -- be it a Payload or Dataset -- is already identified by the class Provider. Is Publisher a different Organization, or is this an unintended redundancy?

4) Misspelled "Data Quality Measurement."

There are more questions, but I would rather receive answers to these, what I consider "top-level" questions, before moving to the others.

wznira commented 2 years ago
  1. According to the SDCM, Service Category taxonomy "classifies a service by the type of service provided or by some other technological or architectural solution," e.g., Aeronautical or Security. Geographical Extent is a location covered by the service. It can be numerical (latitude, longitude) or an identifier (e.g., "USA") that can be associated with some location. If this is the case, it is unclear how a Geospatial Extent can be a sub-class (i.e., specialization) of the Service Category taxonomy the way it is shown on the diagram?

@mkaplun -- agreed. We concluded the discussion on Geographical Extent with the agreement that "a Profile should have zero or more GeographicalExtend".

  1. Payload is "actual (business) data transferred by a message" [SDCM]. The Payload may consist of Data Entities (data elements) described in a Data Definition document. Dataset is an "identifiable collection of data" [OGC Glossary], which from the diagram, also looks like a collection of Data Entities. So both Payload and Dataset are structurally the same (collection of Data Entities, i.e., data elements). What is the difference between them? Why does the model need both?

A payload contains data entities being transmitted in a message. A dataset contains data where the payload data comes from, most likely a database or a data product. For example, each of NOAA's Graphical Forecasts for Aviation Products can be considered a Dataset. Each Dataset will have its own Quality of Data measures, such as resolution and timeliness.

Data from a dataset can be distributed by one more service, while a service can deliver data originated from multiple datasets. For example, a "Terminal Weather Forecast Service X" can include data entities originated from two NOAA products in a single payload. The Quality of Service will be measure using different set of parameters like latency and availability.

  1. Assuming that Dataset represents data produced by the described service, why should it be associated with the class Organization with the role of Publisher? For the described service, the organization responsible for producing -- be it a Payload or Dataset -- is already identified by the class Provider. Is Publisher a different Organization, or is this an unintended redundancy?

As described above, the producer of "Terminal Weather Forecast Service X" may not be the same as the producer of the data, in this case NOAA.

  1. Misspelled "Data Quality Measurement." Will fix👍