Schema and data standards

herkulano commented 7 years ago

We need to discuss and decide what will be the schemas for the data in the RADAR Platform.

Some open standards mentioned by @afolarin :

Design guidelines by the Continua Alliance:

http://www.continuaalliance.org/products/design-guidelines

herkulano commented 7 years ago

An example of the difference on the same type of data "body temperature":

http://www.oneiota.org/revisions/1402

{
    "rt": ["oic.r.health.body.temperature"],
    "id": "unique_example_id",
    "temperature": 36,
    "units": "C",
    "site": "mouth",
    "observedtime": "2016-02-15T09:19Z"
}

http://www.openmhealth.org/documentation/#/schema-docs/schema-library/schemas/omh_body-temperature

{
    "body_temperature": {
        "value": 96.5,
        "unit": "F"
    },
    "effective_time_frame": {
        "time_interval": {
            "start_date_time": "2015-02-05T06:00:00Z",
            "end_date_time": "2015-02-06T06:00:00Z"
        }
    },
    "measurement_location": "oral",
    "descriptive_statistic": "maximum"
}

As you can see their structure is very different for the same type of data.

I prefer Open mHealth because it's health centric and it has more cases and flexibility for this kind of data. The property names are also more descriptive which helps to understand the data.

"body_temperature" vs "temperature"
"measurement_location" vs "site"

Other examples from Open mHealth:

{
    "body_temperature": {
        "value": 96.5,
        "unit": "F"
    },
    "effective_time_frame": {
        "date_time": "2013-02-05T06:25:00Z"
    },
    "temporal_relationship_to_sleep": "on waking"
}

{
    "body_temperature": {
        "value": 97,
        "unit": "F"
    },
    "effective_time_frame": {
        "date_time": "2013-02-05T07:25:00Z"
    },
    "measurement_location": "forehead"
}

afolarin commented 7 years ago

as pointed out by @blootsvoets OMH and Open Connectivity are JSON Schemas (not AVRO), if something like this is used then it is just for the REST API or for downstream

data source --> ingest (AVRO) --> [Kafka etc.] --> [REST API]  --> emit (JSON) --> client

fnobilia commented 7 years ago

We can set up the REST API for supporting both JSON and Avro, it should be straightforward. In this way the downstream clients will select the most suitable format according their requirements.

dennyverbeeck commented 7 years ago

The choice of schema probably also depends on the structure of the client request. If a client requests e.g. all data points captured by a device type on a given day, there is not enough information captured by the schema. For instance the participant id and id of the specific device could be included in the output. We could create our own OMH 'acquisition_point' schema that includes all relevant id's, and links to the already existing schema's for temperature, sleep, etc.

blootsvoets commented 7 years ago

Adding HAL would resolve some of these issues: i.e. referencing additional resources and embedding external resources directly.

blootsvoets commented 7 years ago

The Open mHealth also seem to be more inclusive for health-related issues. For example, oneiota.org does not include "wrist" as a possible body temperature "site", whereas it is a valid mHealth "measurement_location".

For device-related data, Open mHealth does not include any schemas, but perhaps here we can make custom responses.

afolarin commented 7 years ago

would a conversation with OMH be useful? I'd like to know how we can fill the gap between existing OMH schema catalogue and what we might need for RADAR. I could try and setup a meeting with David Haddad.

fnobilia commented 7 years ago

Having clarifications regarding data that will go to collect/show may help. A list of them might be useful for either selecting or customising or implementing our schema. At this point in time we are taking decision regarding future aspects without some information. It is a little bit risky!

herkulano commented 7 years ago

@fnobilia agree, can we get some sample data from several devices? It would be more productive to have a discussion with something a little more concrete.

fnobilia commented 7 years ago

@herkulano I'm working with @blootsvoets to the data pipeline. In the next weeks we hope to provide some results close the final implementation.

herkulano commented 7 years ago

@fnobilia @blootsvoets can you share a sample of the data for discussion?

Maybe start with the data that's coming from the Empatica E4 device plus "Heart Rate sensor" and "Accelerometer" data samples that are mentioned in #3

herkulano commented 7 years ago

@dennyverbeeck OMH has two other schemas that address what you mentioned

The Data Point Schema and the Header Schema

The data point schema has a header schema that contains this information.

Example of a collection of data points using OMH standard:

[
  {
    "header": {
      "id": "123e4567-e89b-12d3-a456-426655440000",
      "creation_date_time": "2013-02-05T07:25:00Z",
      "schema_id": {
        "namespace": "omh",
        "name": "physical-activity",
        "version": "1.1"
      },
      "acquisition_provenance": {
        "source_name": "RunKeeper",
        "source_creation_date_time": "2013-02-05T07:25:00Z",
        "modality": "sensed"
      },
      "user_id": "user1432"
    },
    "body": {
      "activity_name": "walking",
      "distance": {
        "value": 1.5,
        "unit": "mi"
      },
      "reported_activity_intensity": "moderate",
      "effective_time_frame": {
        "time_interval": {
          "date": "2013-02-05",
          "part_of_day": "morning"
        }
      }
    }
  },
  {
    "header": {
      "id": "123e4567-e89b-12d3-a456-426655440000",
      "creation_date_time": "2013-02-05T07:25:00Z",
      "schema_id": {
        "namespace": "omh",
        "name": "physical-activity",
        "version": "1.1"
      },
      "acquisition_provenance": {
        "source_name": "RunKeeper",
        "source_creation_date_time": "2013-02-05T07:25:00Z",
        "modality": "sensed"
      },
      "user_id": "user1432"
    },
    "body": {
      "activity_name": "walking",
      "distance": {
        "value": 1.5,
        "unit": "mi"
      },
      "reported_activity_intensity": "moderate",
      "effective_time_frame": {
        "time_interval": {
          "date": "2013-02-05",
          "part_of_day": "morning"
        }
      }
    }
  }
]

dennyverbeeck commented 7 years ago

@herkulano Wonderful! Can't believe i missed that :)

afolarin commented 7 years ago

Has anyone happened across oneM2M http://www.onem2m.org

Ontologies and Middleware for IoT, just published their first specification last year and an update this year.

http://www.onem2m.org/images/files/onem2m-executive-briefing_A4.pdf http://www.onem2m.org/images/files/oneM2M-whitepaper-January-2015.pdf

fnobilia commented 7 years ago

We can set up the REST API for supporting both JSON and Avro

This commit proves the feasibility of proving JSON response using AVRO. Two protocols in output without additional effort!

afolarin commented 7 years ago

Discussed today with @mbegalevibrent it would be prudent to look at the FIHR standards for compatibility with any downstream usecases for clinical record interraction. See FIHR observations http://www.hl7.org/fhir/observation.html

fnobilia commented 7 years ago

There is a style violation in our current RestApi schemas: naming convention does not respect the Google JSON Style. The generated JSON uses the same rule of OMH: schema_id instead of schemaId. We should clarify this point.

blootsvoets commented 7 years ago

Indeed, we just discussed the style in #13 as well. The data ingestion (common directory) for now does use the Google JSON style. I'll clarify that in the README.

RADAR-base / RADAR-Schemas

Schema and data standards #1