SciCatProject / scicat-backend-next

SciCat Data Catalogue Backend
https://scicatproject.github.io/documentation/
BSD 3-Clause "New" or "Revised" License
19 stars 21 forks source link

Validation of scientificMetadata #966

Open sbliven opened 8 months ago

sbliven commented 8 months ago

Summary

Enable validation of scientificMetadata against a schema. The schema should be configurable for each site and default to no validation. In some cases multiple schema might be desirable, with different features available depending on the schema.

Motivation

Current Behaviour

Validation must be performed out-of-band by the ingestor tools/libraries

Expected Behaviour

Questions

  1. Should users be able to specify (optionally) a more restrictive schema? For instance, a schema for EM data. Should this be another top-level property, or something like scientificMetadata.@context (JSON-LD style).
  2. What schemas are implicitly supported by SciCat currently (eg with specific frontend behavior)?
sbliven commented 1 month ago

Before implementing this we would have to agree on the schema technology and the spellings for the terms. I think that the schema language is not a major barrier to adoption, as there are very good tools for converting both schemas and data between different serializations (eg with LinkML).

JSON Schema

One option would be to add a json schema to the dataset model. Most people I've discussed this with seem to expect the schema to apply only to the scientificMetadata, not to the whole json.

{
    "pid": "123...",
    "schema": "https://osc-em.github.com/oscem_schema.json",
    "scientificMetadata": {
        [validate this]
    }
}

If "schema" is omited we would default to no validation for backwards compatibility.

JSON-LD

JSON-LD also has good support and is used in many metadata containers (eg RO-Crate). One nice feature is that the context (aka schema) can be either a shared schema included by URI or can be inlined into the data itself. We could adapt the dataset model so that it's valid JSON-LD by including a @context key linking a published minimal schema for scicat. This could even be overridden within scientificMetadata to specify a more restrictive schema.

LinkML

Would it make sense to validate LinkML directly?

sbliven commented 1 month ago

Notes from SciCatCon day 2 discussion:

linupi commented 1 month ago

Notes from the discussion on 4/7/2023

Next steps (independent):

sbliven commented 1 month ago

Releated discussion: https://github.com/SciCatProject/scicat-backend-next/discussions/1308