elixir-europe / biovalidator

JSON validator derived from AJV supporting ontology and taxonomy validation.
Apache License 2.0
20 stars 6 forks source link

[BUG]: Failed to compile schema: ``async keyword in sync schema`` #57

Open M-casado opened 1 year ago

M-casado commented 1 year ago

Bug summary

When the server is deployed with referenced schemas (-ref argument) and custom keyword graphRestriction is used, the validation crashes when compiling the schemas.

Technical details

To reproduce

  1. Clone and install Biovalidator's project
    git clone https://github.com/elixir-europe/biovalidator.git
    cd biovalidator
    npm install
  2. Clone EGA's metadata GH project:
    cd ..
    git clone git@github.com:EbiEga/ega-metadata-schema.git
  3. Deploy Biovalidator local server with referenced schemas
    sdir="ega-metadata-schema/schemas"
    node src/biovalidator -r "$sdir/*.json" -r "$sdir/controlled_vocabulary_schemas/*.json"
  4. Launch a for loop with all JSON documents in the directory requesting validation for each of them and observe how the ones using custom keywords (analysis, experiment, individual, object-set and sample) fail with the same result:
    $ cd ega-metadata-schema
    $ for file in $( ls ./examples/json_validation_tests/*json); do echo $file; curl --data @$file -H "Content-Type: application/json" -X POST http://localhost:3020/validate; echo ""; done
    ./examples/json_validation_tests/DAC_valid-1.json
    []
    ./examples/json_validation_tests/analysis_valid-1.json
    {"error":"Failed to compile schema: Error: async keyword in sync schema"}
    ./examples/json_validation_tests/assay_valid-1_array.json
    []
    ./examples/json_validation_tests/assay_valid-2_sequencing.json
    []
    ./examples/json_validation_tests/dataset_valid-1.json
    []
    ./examples/json_validation_tests/experiment_valid-1.json
    {"error":"Failed to compile schema: Error: async keyword in sync schema"}
    ./examples/json_validation_tests/individual_valid-1.json
    {"error":"Failed to compile schema: Error: async keyword in sync schema"}
    ./examples/json_validation_tests/object-set_valid-1.json
    {"error":"Failed to compile schema: Error: async keyword in sync schema"}
    ./examples/json_validation_tests/policy_valid-1.json
    []
    ./examples/json_validation_tests/protocol_valid-1.json
    []
    ./examples/json_validation_tests/protocol_valid-2.json
    []
    ./examples/json_validation_tests/protocol_valid-3.json
    []
    ./examples/json_validation_tests/sample_valid-1.json
    {"error":"Failed to compile schema: Error: async keyword in sync schema"}
    ./examples/json_validation_tests/study_valid-1.json
    []
    ./examples/json_validation_tests/submission_valid-1.json
    []

Observed behaviour

Validation stops for those JSON documents/schemas that are using a custom keyword (graphRestriction in this case).

Expected behaviour

Schemas should be compiled correctly and validation executed.

Additional context

All of the schemas within the schemas/ directory have "$async": true at root level, which renders the error message confusing. More importantly, if not given at deployment, but fetched (i.e. the reference resolves against the raw text and it's retrieved automatically by the tool), the validation works:

  1. Clone and install Biovalidator's project
    git clone https://github.com/elixir-europe/biovalidator.git
    cd biovalidator
    npm install
  2. Clone EGA's metadata GH project:
    cd ..
    git clone git@github.com:EbiEga/ega-metadata-schema.git
  3. Deploy Biovalidator local server without any referenced schemas
    node src/biovalidator
  4. Launch a for loop with all JSON documents in the directory requesting validation for each of them. In this case validation may not be satisfied for other reasons, but at least was executed:
    $ for file in $( ls ./examples/json_validation_tests/*json); do echo $file; curl --data @$file -H "Content-Type: application/json" -X POST http://lo
    calhost:3020/validate; echo ""; done
    ./examples/json_validation_tests/DAC_valid-1.json
    []
    ./examples/json_validation_tests/analysis_valid-1.json
    [{"dataPath":"/targetedLoci/0/organismDescriptor/taxonIdCurie","errors":["Provided term is not child of [http://purl.obolibrary.org/obo/NCBITaxon_1]"]}]
    ./examples/json_validation_tests/assay_valid-1_array.json
    {"error":"Failed to compile schema: Error: AnySchema https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.common-definitions.json is loaded but https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.common-definitions.json#/definitions/sampleLabel-association cannot be resolved"}
    ./examples/json_validation_tests/assay_valid-2_sequencing.json
    {"error":"Failed to compile schema: Error: AnySchema https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.common-definitions.json is loaded but https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.common-definitions.json#/definitions/sampleLabel-association cannot be resolved"}
    ./examples/json_validation_tests/dataset_valid-1.json
    []
    ./examples/json_validation_tests/experiment_valid-1.json
    {"error":"Failed to compile schema: RangeError: Maximum call stack size exceeded"}
    ./examples/json_validation_tests/individual_valid-1.json
    [{"dataPath":"/organismDescriptor/taxonIdCurie","errors":["Provided term is not child of [http://purl.obolibrary.org/obo/NCBITaxon_1]"]}]
    ./examples/json_validation_tests/object-set_valid-1.json
    {"error":"Failed to compile schema: RangeError: Maximum call stack size exceeded"}
    ./examples/json_validation_tests/policy_valid-1.json
    []
    ./examples/json_validation_tests/protocol_valid-1.json
    []
    ./examples/json_validation_tests/protocol_valid-2.json
    []
    ./examples/json_validation_tests/protocol_valid-3.json
    []
    ./examples/json_validation_tests/sample_valid-1.json
    [{"dataPath":"/organismDescriptor/taxonIdCurie","errors":["Provided term is not child of [http://purl.obolibrary.org/obo/NCBITaxon_1]"]},{"dataPath":"/sampleCollection/samplingSite/sampledOrganismPartCurie","errors":["Provided term is not child of [http://www.ebi.ac.uk/efo/EFO_0000635]"]},{"dataPath":"/sampleStatus/0/conditionUnderStudy/cusCurie","errors":["provided term does not exist in OLS: [XCO:0000398]"]}]
    ./examples/json_validation_tests/study_valid-1.json
    []
    ./examples/json_validation_tests/submission_valid-1.json
    []
M-casado commented 1 year ago

I have also noted that using graphRestriction keyword within a schema (schema A) instead of having a schema (schema A) reference the graphRestriction from another schema (schema B) works. Also when referencing the schemas at deployment of the server.

In other words, if the custom keyword is not in another schema, which makes it required to be compiled, the graphRestriction keyword works as intended. On the other hand, when the custom is used by a property in another schema that is later referenced in a second schema, it fails.

Could it be because when compiling the schemas, the result does not contain the required "$async": true, but when the schema has the custom keyword per se, it does keep that part?

In my particular case I noticed that many of the examples did not work, as explained above, but the policy object, who now also uses a graphRestriction did work. See below the property using the custom keyword within EGA.policy.json, although I foresee its structure not being relevant for this particular issue:

  "duoCodes": {
          "type": "array",
          "title": "Data Use Ontology (DUO) codes",
          "description": "Collection of Data Use Ontology (DUO) codes. These allow to semantically tag datasets (bound by policies) with restriction about their usage, improving their discoverability based on the authorization level of users, or intended usage. See more info at https://obofoundry.org/ontology/duo.html and search for DUO codes at https://www.ebi.ac.uk/ols/ontologies/duo",
          "minItems": 1,
          "additionalProperties": false,
          "uniqueItems": true,
          "items": {
              "type": "object",
              "title": "Data Use Ontology (DUO)",
              "description": "Single Data Use Ontology (DUO) code.",
              "allOf": [
                {
                  "title": "Inherited ontologyTerm structure of termId and termLabel",
                  "$ref": "./EGA.common-definitions.json#/definitions/ontologyTerm"
                }
              ],
              "properties": {        
                "termId": {
                  "title": "Ontology constraints for this specific termId",
                  "anyOf": [
                    {
                      "graphRestriction":  {
                        "ontologies" : ["obo:duo"],
                        "classes": ["DUO:0000001"],
                        "relations": ["rdfs:subClassOf"],
                        "direct": false,
                        "include_self": false
                      }
                    },
                    {
                      "graphRestriction":  {
                        "ontologies" : ["obo:duo"],
                        "classes": ["DUO:0000017"],
                        "relations": ["rdfs:subClassOf"],
                        "direct": false,
                        "include_self": false
                      }
                    },
                    {
                      "graphRestriction":  {
                        "ontologies" : ["obo:duo"],
                        "classes": ["OBI:0000066"],
                        "relations": ["rdfs:subClassOf"],
                        "direct": false,
                        "include_self": false
                      }
                    }
                  ],              
                  "examples": [ "DUO:0000046", "DUO:0000028", "DUO:0000032" ]
                }
            }
          }
        }
theisuru commented 1 year ago

@M-casado Could you please check the dev branch to see if it's fixed there.

If you remember we did a modification to populate all the supplied schema and definitions with $async keyword few months back, as AJV requires it to compile asynchronously. But adding this to locally supplied schema was missed and hence the error.

M-casado commented 1 year ago

It works! 💃 I tested it by also modifying the local schemas and JSON documents, and checking if they were fetched or the ones given at deployment would be used.

M-casado commented 1 year ago

@theisuru I'm afraid I encountered the exact same issue again, with the same set of files, same response, etc. I was validating some changes today when the exact same files that failed due to Failed to compile schema: Error: async keyword in sync schema started failing again.

I'm using the latest dev branch, but it happens as well on the main branch. I checked your commit https://github.com/elixir-europe/biovalidator/commit/b7dc1d8bcb5af9324765b0e1c00f79b23454ea8b and the file in main, and it seemed not be overwritten by some other commit. But it cannot be a coincidence, I assume.

This is a bit of a blocker again: I cannot validate files with changes locally, unless they are fetched from the Git repo, I assume. So I cannot continue on their development.

theisuru commented 1 year ago

@M-casado I could not replicate the error. When I test locally, referenced schema seems to be working fine with graphRestriction keyword. I created a test to capture this error. Could you please check if this test represents your problem. We can have this test case to stop this issue from recurring.

M-casado commented 1 year ago

I'll check further locally but it all started with these GitHub actions (example) that fail when cloning Biovalidator's main, installing, deploying it with local schemas, and then validating a set of documents.

I'll try again with the last version of main and dev in local, to see what the problem may be.