elixir-europe / biovalidator

JSON validator derived from AJV supporting ontology and taxonomy validation.
Apache License 2.0
18 stars 6 forks source link

[Requested feature]: separate file with ontology and taxonomy rules #66

Open sneumann opened 1 year ago

sneumann commented 1 year ago

Summary

The Extended keywords for ontology and taxonomy validation is a quite unique feature in this validator, and requires the graphRestriction, isChildTermOf and isValidTaxonomy in the test_schema.json file. If the JSON-LD schema definition is not under my control, I would like these semantic validations to be passed into biovalidator from a second file.

Motivation

I would like to allow better validation for schema.org and bioschemas metadata. Currently, there are types defined in JSON schema for e.g. https://schema.org/Dataset or https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE, which are developed in e.g. https://github.com/BioSchemas/specifications/tree/master/Dataset/ or https://github.com/BioSchemas/specifications/tree/master/MolecularEntity/.

These types allow various properties to have values as https://schema.org/DefinedTerm, and I'd expect the majority of these come from OBO ontologies you'd find on terminology services like OLS or NCBO.

However, I'd expect that schema.org wants to keep their types lean and won't allow people to add further validation into their schema definition. Also, for one schema type, there might be multiple profiles in different communities suggesting / requesting different restrictions on allowed ontology terms.

Example

An example would probably great, but I don't have one yet. I only found biovalidator at last weeks AllHands in Dublin :-)

Yours, Steffen

theisuru commented 1 year ago

Hi Steffen,

Thanks for reaching out. After Dublin, we were also thinking about bioschemas and how we can extend support.

If I understood your use case correctly: Since the second schema file is in your control, you can reference the bioschemas definition in your schema file. allof is one of keywords provided by JSON Schema, which you can use to validate against multiple schema.


        "allof": [
            {
                "$ref": "https://schema.org/TYPE/JSON_SCHEMA_REPRESENTATION"
            },
            {
                "type": "object",
                ..... secondary validation here
            }
        ]
theisuru commented 1 year ago

I just quickly glanced over the bioschemas dataset definition. I can see in $validation section, JSON Schema is being used. I will give it a try put up an example, how we can use biovalidator to validate bioschema (thinking about the biosamples type)

M-casado commented 1 year ago
   "allof": [

Just a minor comment: JSON Schema keywords are case-sensitive, so it would be allOf instead of allof. Otherwise I don't think it'll work.

sneumann commented 1 year ago

Hi, Indeed, correct direction. Here is the promised example for a Defined Term:

{
    "@type": "DefinedTerm",
    "@id": "http://purl.obolibrary.org/obo/CHMO_0000230",
    "termCode": "CHMO_0000230",
    "name "alpha-particle spectroscopy",
    "identifier": "http://purl.obolibrary.org/obo/CHMO_0000230",
    "url": "http://purl.obolibrary.org/obo/CHMO_0000230",

    "inDefinedTermSet":
    {
        "@type": "DefinedTermSet",
        "@id": "http://purl.bioontology.org/ontology/CHMO"
        "name": "Chemical Methods Ontology",
        "identifier": "http://purl.bioontology.org/ontology/CHMO"
        "url": "https://github.com/rsc-ontologies/rsc-cmo"
    }
}

And what I want to validate for the above could be:

      "isChildTermOf": {
        "parentTerm": "http://purl.obolibrary.org/obo/CHMO_0000800",
        "ontologyId": "CHMO" ## Or "chmo" ?! Probably "Ontology ID" from https://www.ebi.ac.uk/ols/ontologies/chmo 
      }

Other examples for DefinedTerm are in e.g. https://github.com/BioSchemas/specifications/blob/75b427325742f8e2d3b2c00299bec4f826c56f47/Course/examples/1.0-RELEASE/course.json#L11

Yours, Steffen

sneumann commented 1 year ago

Hi, we are currently trying to conjure more examples, and we will prepare more validation rules. It would be great to have some biovalidator functionality to play with at the ELIXIR and ELIXIR-DE Biohackathons. Any progress, or did you hit a roadblock ? Thanks in advance, yours, Steffen

theisuru commented 1 year ago

Hi Steffen,

I have given a try with allOf at the top level and created a test case to aggregate a given schema and a custom schema, but this failed to validate correctly. I am not sure if it is because of wrong JSON Schema syntax or implementation problem. I will check this further and let you know.

This is an example that I have tried.

{
  "$id": "BioSchema/plus/customSchema/for/DefinedTerm",
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "description": "Use custom schema on top of BioSchema to validate BioSchema type",
  "type": "object",
  "$allOf": [
    {
      "$ref": "path/to/bioschemas/definedterm"
    },
    {
      "description": "My custom schema for DefinedTerm",
      "type": "object",
      "properties": {
        "termCode": {
          "type": "string",
          "isChildTermOf": {
            "parentTerm": "http://purl.obolibrary.org/obo/CHMO_0000800",
            "ontologyId": "chmo"
          }
        }
      },
      "required": [
        "termCode"
      ]
    }
  ]
}