DAS-RCN / DAS_metadata

Tools for standardizing Distributed Acoustic Sensing (DAS) metadata
Creative Commons Attribution 4.0 International
25 stars 3 forks source link

Metadata formalize as JSON-Schema #5

Closed miili closed 1 year ago

miili commented 2 years ago

I propose to formalize meta data proposal in term/ to a machine and human readable format. I vote for json-schema, the schemas can easily be validated and translated into various other schemas, see https://app.quicktype.io/.

The thingy would look like this

{
    "title": "Distributed Acoustic Sensing Metadata - Overview",
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "http://example.com/example.json",
    "type": "object",
    "default": {},
    "required": [
        "location",
        "number_of_interrogators",
        "principle_investigators",
        "start_datetime",
    ],
    "properties": {
        "location": {
            "type": "string",
            "default": "",
            "title": "Description of geographic location",
            "examples": [
                "Parkfield, California, USA"
            ]
        },
        "deployment_type": {
            "type": "string",
            "default": "",
            "title": "Describes the permanency of the deployment",
            "examples": [
                "permanent"
            ]
        },
        "number_of_interrogators": {
            "type": "number",
            "default": "",
            "title": "Number of interrogators used to collect data over the course of data collection",
            "examples": [
                2
            ]
        },
        "principle_investigators": {
            "type": "string",
            "default": "",
            "title": "Point of Contact(s)",
            "examples": [
                "P.I. Doe"
            ]
        },
        "start_datetime": {
            "type": "string",
            "format": "date-time",
            "default": "",
            "title": "Start date of experiment",
            "examples": [
                "2018-02-11T00:00:00"
            ]
        }
    },
    "examples": [
        {
            "location": "",
            "number_of_interrogators": 1,
            "principle_investigators": "",
            "start_datetime": "",
        }
    ]
}
miili commented 1 year ago

Please see PR #4

andreas-wuestefeld commented 1 year ago

What would be the use-case where such schema is beneficial?

In my humble opinion, this "schema" seems to have most benefits for dynamically changing input, for example user input in a web form. Validity needs to be checked. However meta data from DAS projects are static.

I believe that here human readability outweighs benefits of validity checks. Additionally, JSON reader packages exist for all languages, and the addition of this schema provides rather an obstacle to import JSON into, say, MatLab or Python. Your example fails to read in both languages with the build-in functions.

I am sure JSON schema have their justification, but if the DAS meta format is suitable, I am not convinced yet.

miili commented 1 year ago

Hi Andreas,

JSON schema is just a format to describe the metadata, not to hold the metadata. The schema describes, formalizes, confines and restricts what is expected as input. See this example here:

JSON Schema is an IETF standard providing a format for what JSON data is required for a given application and how to interact with it. Applying such standards for a JSON document lets you enforce consistency and data validity across similar JSON data.

See this example:

"start_datetime": {
    "type": "string",
    "format": "date-time",
    "default": "",
    "title": "Start date of experiment",
    "examples": [
        "2018-02-11T00:00:00"
    ]
}

It confines this field to be a date time object, gives a title and example. When we have a schema formalized in JSON schema we can:

  1. Validate a JSON Object
  2. Generate code and validators for many programming languages (see e.g. https://quicktype.io)
  3. Generate documentation from a single source of truth

JSON schema is the standard to describe how communication should look. It is used by most modern APIs, see e.g. https://www.openapis.org/

andreas-wuestefeld commented 1 year ago

Ah, that was not clear from your original post. So it is another addon...

vhlai-seis commented 1 year ago

Noted on JSON schema, which is also suggested in a separate issue. Will be incorporated in the updated attributes. Keep you all posted.