elixir-europe / biovalidator

JSON validator derived from AJV supporting ontology and taxonomy validation.
Apache License 2.0
20 stars 6 forks source link

[Feature request]: Entity referencing (AJV's ``$data`` keyword) #69

Open M-casado opened 1 year ago

M-casado commented 1 year ago

Summary

Inclusion of AJV's solution for entity referencing: $data or something similar.

Motivation

Entity referencing within the schemas would allow to construct more complex restrictions in the validation.

Details

Similar to what AJV implemented as part of their combining-schemas documentation, the idea would be to allow Biovalidator to not only interpret $ref, which so far has been incredibly useful, but also $data keywords in the schemas.

The $data keyword would be used to dynamically reference data within the constraints of a JSON Schema definition. In other words, not fully knowing the value that may be provided for a property would not impede that such value could be used in a constraint.

I tested a locally deployed server of Biovalidator with the examples below, and the validation did not work as I expected, so I assume it's not part of it.

Examples

Some time ago I tested this feature with AJV and made three mock examples with schemas here. Below I format some of them in the schema & data format of Biovalidator's message:

# The following should pass validation, given that the first and second MD5 are equal, and that is the constraint stablished
#    in the schema (i.e. the data from MD5_1 should be the constant of MD5_2). 
{
    "schema": {
        "type": "object",
        "required": ["MD5_1", "MD5_2"],
        "properties": {
            "MD5_1": {
            "type": "string"
            },
            "MD5_2": {
            "type": "string",
            "const": { "$data": "1/MD5_1" }
            }
        }
    },
    "data": {
        "MD5_1": "06266488e1b14195523df877eac39b31",
        "MD5_2": "06266488e1b14195523df877eac39b31"    
    }
}

# The following should not pass validation, but it does, since the interpretation of the schema does not include the 
#    negative reference to the $data in MD5_1.
{
    "schema": {
        "type": "object",
        "required": ["MD5_1", "MD5_2"],
        "properties": {
            "MD5_1": {
                "type": "string"
            },
            "MD5_2": {
                "type": "string",
                "not": { 
                    "const": { "$data": "1/MD5_1" } 
                }
            }
        }
    },
    "data": {
        "MD5_1": "06266488e1b14195523df877eac39b31",
        "MD5_2": "06266488e1b14195523df877eac39b31"
    }
}

Use-cases

The flexibility that $data provides is enormous, but a few use cases, at least for the EGA, could be:

M-casado commented 1 year ago

As additional context, the entity referencing that AJV allows with the keyword $data is similar to the entity referencing of JSON-LD through the @id keyword. Therefore, was Biovalidator to be JSON-LD-aware (https://github.com/elixir-europe/biovalidator/issues/68), this feature could be replaced with the @id solution.

M-casado commented 1 year ago

Another example of a use-case in the JSON Schemas would be when refencing the core identifier of the object. For example, in our relationships model, we allow for directional and tagged linkages to be made within the objects. These have a source and a target, pointing to the ends of the relationship.

So far, given that we couldn't make use of $data, we solved the issue of duplicating the core identifier of the object in all of these ends by having one of these missing and inferred in the logic later on. In other words, if the source is the one provided, then the target is assumed to be the object itself, and vice versa.

Albeit we may not go back to a solution with using $data, had we been able to use it in the beginning, we may have had that adapted as such.