elixir-europe / biovalidator

JSON validator derived from AJV supporting ontology and taxonomy validation.
Apache License 2.0
20 stars 6 forks source link

File-resolvable references (context for $id and $ref) #33

Closed M-casado closed 1 year ago

M-casado commented 2 years ago

Summary

To be able to provide Biovalidator with filepaths to resolve references between JSON files.

Description

When building complex schemas it is common to reference inherited subschemas between JSON files through $id and $ref. Currently I don't think there is a way for Biovalidator to accept filepaths of JSON files that contain such $ids mentioned elsewhere. This is truly useful when running a JSON validator locally and when handling static references to files with different versions (e.g. "$id": "my-file.txt" existing in different versions of my-file.txt). Besides, although $ids are suggested to be URL-resolvable, this is just a recommendation, and thus it should not be expected for $ids to always point to the correct JSON files.

AJV-cli already allows for this to be done through the -r argument (referenced schemas). This allows for the whole set of schemas to be passed to the validator. As an example:

schema_name="object-set"
json_doc="object-set-valid-1.json"
ajv --spec=draft2019 -s schemas/EGA.$schema_name.json -d schemas/validation_tests/$json_doc -r "schemas/EGA.!($schema_name).json"
M-casado commented 2 years ago

Hi @theisuru, sorry to bother you again, but I was curious if this feature is still planned or is otherwise discarded.

theisuru commented 2 years ago

hi @M-casado we can think of resloving given refrence using URI schemas. Example in your case would be passing $ref as file://~/schemas/EGA/test.json. Would that be sufficient? or are you looking for an option to specify a default schema in general.

M-casado commented 2 years ago

Hi again @theisuru! Sorry for the late answer.

Actually in our use-case I've been specifying URIs for our $id parameters, so they are already pointing to a file, but in our case to a GitHub file. See this, for example:

# Current example of a reference:
"$ref": "https://github.com/EbiEga/ega-metadata-schema/tree/main/schemas/EGA.common-definitions.json#/definitions/relationship_object
# Raw file in GH:
https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.common-definitions.json

They point to the main GitHub repo, instead of the raw files, per se. If Biovalidator is able to pick the text files from those links, problem solved I would say. If not, going from https://github.com/... to https://raw.githubusercontent.com/... and getting rid of the tree in the middle is probably easy when parsing the schemas. Either that, or we could change the whole set of $ids if applicable.

I would love for this to get inertia, since we will soon start to actively seek ontology validation in our metadata validation. If possible, I would love to use Biovalidator instead of re-inventing the wheel. Would you be kind enough to meet soon with me and perhaps EGA developers to go over the possible course of action?

theisuru commented 2 years ago

@M-casado, Happy to have a talk about this. Tony also recently mentined about the importance of having shared tools between our teams. So it will be possible to allocate time for this. We can prepare a set of goals and timeline for the biovalidator. Please feel free to schedule a meeting. Me and Dipayan in our team can come.

M-casado commented 2 years ago

Hi @theisuru. Everybody is welcomed, but I mainly expect you, Dipayan, an EGA developer and myself. We could highlight:

theisuru commented 2 years ago

Hi @M-casado, I have pushed "referenced schema" changes to the branch support_local_schema_directory. I will add documentation soon. For now: similar to ajv-cli, this is supporting glob patterns. You can also provide directory or a file name.

 node ./validator-cli.js --schema=./test/resources/ref_test_schema.json --json=./test/resources/ref_test_valid.json --ref=./test/resources/schema_dir
M-casado commented 2 years ago

Hi @theisuru. I finally got my hands on the tool, and I'm very happy to confirm that it works as intended. I have yet to test it thoroughly, with validation corner cases, errors, etc. but based on some light testing it looks great.

Specs

Started from scratch with the installation of Biovalidator. Went well and just had to fix some vulnerabilities (automatically done with npm audit fix).

File references (through CLI)

I tested different combinations of JSON object validations through the CLI (node ./validator-cli.js ...), mainly checking those that referenced each other (see real time next to each test):

Ontology checks

Given the chance, I also checked that the ontology validation worked (also with references). As an example I used the following object and schema:

# Object:
{
     "example1_ontology_ref": "PATO:0000384"
}
# Schema:
"ontology_ref": {
        "graph_restriction":  {
            "ontologies" : ["obo:efo"],
            "classes": ["PATO:0001894"],
            "relations": ["rdfs:subClassOf"],
            "direct": false,
            "include_self": false
        }
    }

Although it was a single OLS API call, the validation was quick considering the difference between the basic validation and the one with OLS included:

Biovalidator as a server

Once again, wanted to check for myself how to use this tool as a server. I only checked the server as localhost and got a pleasant surprise with how quick it was:

TL;DR summary

It works, and even better than I expected.

My only concern now, @theisuru is how to provide file references to the server, since currently it only expects a single JSON object for schema, and a single JSON object to validate.

M-casado commented 2 years ago

Tested the references given when deploying the server and it works. These changes are currently within the dev branch. As soon as it is merged, I think this ticket can be closed.

Example of the command:

node src/biovalidator.js --ref=schemas/*.json --ref=schemas/controlled_vocabulary_schemas/*