corvus-dotnet / Corvus.JsonSchema

Support for Json Schema validation and entity generation
Apache License 2.0
99 stars 9 forks source link

Validation results contain more errors than expected #408

Closed manekovskiy closed 2 weeks ago

manekovskiy commented 2 weeks ago

My goal is to display errors related to document properties with invalid values. The document schema is quite complex, so it is imperative to display as much information as possible about the errors.

The issue I’m observing is that the number of errors is higher than expected when the property is configured to allow two or more types.

Schema:

CombinedDocumentSchema.json
{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "title": "Combined document schema",
  "type": "object",
  "allOf": [
    {
      "$ref": "./DocumentSchema.json"
    },
    {
      "$ref": "./DocumentExtensionsSchema.json"
    }
  ]
}

DocumentSchema.json
{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "title": "Document",
  "type": "object",
  "properties": {
    "size": {
      "type": [ "integer", "string" ]
    }
  }
}

DocumentExtensionsSchema.json
{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "title": "Document extensions",
  "type": "object",
  "properties": {
    "extendedSize": {
      "type": [ "integer", "string" ],
      "enum": [ "", 1, 2, 3, "1", "2", "3" ]
    }
  }
}

Validation code:

var documentContent = @"{
    ""size"": null,
    ""extendedSize"": """"
}";
var document = CombinedDocument.Parse(documentContent);

var validationContext = document.Validate(ValidationContext.ValidContext, ValidationLevel.Detailed);
if (!validationContext.IsValid)
{
    Console.WriteLine("Validation failed");
    foreach (var result in validationContext.Results)
    {
        if (!result.Valid)
        {
            Console.WriteLine($"{result.Location}: {result.Message}");
        }
    }
}

Output:

Validation failed
(#/allOf/0/properties/size, DocumentSchema.json#/properties/size, #/size): Validation 6.1.1 type - should have been 'string' but was 'Null'.
(#/allOf/0/properties/size, DocumentSchema.json#/properties/size, #/size): Validation 6.1.1 type - should have been 'number' with zero fractional part but was 'Null'.
(#/allOf, CombinedDocumentSchema.json, #): Validation 10.2.1.1. allOf - failed to validate against the allOf schema.
(#/allOf/1/properties/extendedSize, DocumentExtensionsSchema.json#/properties/extendedSize, #/extendedSize): Validation 6.1.1 type - should have been 'number' with zero fractional part but was 'String'.

In this output, the last line mentioning the extendedSize property does not look correct. I think it is misleading and should not be included in the list of invalid results because the value of extendedSize satisfies the schema constraints.

The real schema I’m working with is more complex, with a considerable number of fields that can have multiple types. When one value causes a document validation failure, the code produces many errors mentioning seemingly random properties that allow two or more types.

mwadams commented 2 weeks ago

Although it is "decipherable" by building the tree, I agree that it is less than helpful.

I propose to change this so that we get a single diagnostic result for type even if it is the array version of the keyword.

Also, we will be a bit more sophisticated about how we merge error results for the Level.Detailed - only producing results for the cases that contribute to the failure case. (E.g. the failures for most keywords, the complete set of failures, or the multiple successes for oneOf)

Level.Verbose will continue to produce all of the output.

This will be available for you to try in 4.0.0-preview.9 some time later this week.

BTW - notice that the type constraint is not necessary when you have an enum keyword and your schema validation will be somewhat faster.

mwadams commented 2 weeks ago

@manekovskiy - The results for your example above will look like:

Validation failed
(#/allOf, CombinedDocumentSchema.json#/allOf, #): Validation - allOf failed to validate against the schema.
(#/allOf/1/$ref/properties/extendedSize/type, DocumentExtensionsSchema.json#/properties/extendedSize/type, #/extendedSize): Validation type - should have been 'string', 'integer'.
manekovskiy commented 2 weeks ago

Thank you for the update. The results look good! It certainly helps to see the list of allowed types for the property on one line.


In a meanwhile, as a workaround, I invoke the validation with the ValidationLevel.Verbose, group the results by document location, and then within each group, I can determine if the errors are real or a byproduct of excessive verbosity.

mwadams commented 2 weeks ago

That's a reasonable (in fact, the intended!) approach.

The results are the consequence of applying the schema at that location.

The composition keywords aggregate those results in various ways up the tree to the root, where you get a final validation results.

These results can be used for things other than validation.

One quick tip - it is usually faster to do a quick ValidationLevel.Flag test first, and only do a full validation if that is not valid. That way you don't pay for collection for the common case of valid data.

mwadams commented 2 weeks ago

Fixed in 4.0.0-preview.9.