amazon-ion / ion-schema-rust

Rust implementation of Ion Schema
https://amazon-ion.github.io/ion-schema/sandbox
Apache License 2.0
12 stars 6 forks source link

Please include path in Violations #140

Closed dlurton closed 1 year ago

dlurton commented 1 year ago

The violations returned by ion_schema do not include any information where in the document the violation occurred, making fixing these violations difficult even in small documents or nearly impossible in large documents.

Including some kind of path to the value in violation would make it much easier to locate the errors. Something like this would help a lot:

./addresses[1].city

This points at the city field in the 2nd struct in the addresses field of a "customer" struct.

Combining this with line numbers (#102) would be seriously helpful and make ion_schema a lot more attractive IMHO.

dlurton commented 1 year ago

IMO, this isn't just a nice-to-have feature. Let me explain why:

Right now when I want to test that my ISL is working as I intend, I can only have tests that assert a given document is valid or invalid, but I cannot include an assertion as to "why" the document is invalid. This means with anything but the most trivial of ISL documents I cannot have any confidence in the tests that assert certain documents are invalid, because I can't be certain that my test input doesn't violate a constraint other than one I intended to test, and leading to a false positive test. IMO, being able to assert on the reason for a failure is especially important with schemas that evolve with time, where the modification or addition of constraints could cause my negative tests to continue to pass without modification, giving me false confidence. Such tests actually have negative value, IMO.

You already include information about which constraint is being violated--simply adding the path to the value that's in violation would allow better assertions which prevent these false positives.

desaikd commented 1 year ago

Thanks for explaining the intent behind this feature.

Currently ion-schema-rust uses the Element API from ion-rust to parse the elements of given schema file. Right now, Element API doesn't track position information for given Ion value but this is something in the roadmap to add position information over there and eventually in ion-schema-rust for better Violation messages and accurate location information in order to make it easier to understand the validation results.

I am going to keep this issue open to update with all the progress on this feature.

desaikd commented 1 year ago

148 adds path information in Violation now! Here's an example of how it looks like:

Schema

type::{
  name: customer,
  type: struct,
  fields: {
    addresses: {
      type: list,
      element: {
        type: struct,
        fields: {
          city: { type: string, codepoint_length: range::[8, max] },
          state: symbol
        }
      }
    }
  }
}

Ion Value to be validated:

{
  addresses: [
    { city: "Seattle", state: WA },
    { city: "Portland", state: OR },
    { city: "San Francisco", state: CA }
  ]
}

Output Violation (Using the deeply nested violation for example here):

Violation {
    constraint: "{type: string, codepoint_length: range::[8, max]}",
    code: TypeConstraintsUnsatisfied,
    message: "value didn't satisfy type constraint(s)",
    ion_path: (addresses 0 city),
    violations: [
        Violation {
            constraint: "codepoint_length",
            code: InvalidLength,
            message: "expected codepoint length range::[ 8, max ] found 7",
            ion_path: (addresses 0 city),
            violations: [],
        },
    ],
}

@dlurton, it would be great to get your feedback on this. Would you be interested to try it out?

dlurton commented 1 year ago

What does ion_path in the parent Violation become when there are multiple violations? What I see there only makes sense when there is one or no child violations as is the case with both Violation records here.

I do like the idea of using Ion path extractor paths to denote the paths here--this is standard enough that we can even write tests that assert particular fields are being identified invalid correctly.

dlurton commented 1 year ago

I would note however that Ion path extractor paths are not very readable unless you know Ion path and therefore it might not make sense to include such paths in error messages returned to callers of my service, who don't know what Ion path even is. Would it make sense to come up with something more human readable, like, addresses[0].city?

I don't know if both of these solutions are needed--either would be a major improvement over the current APIs.

desaikd commented 1 year ago

While working on this improvement we realized that the json path kind of signature(addresses[0].city) might be ambiguous for some use cases. For example,

// Violation at the field "com.amazon.foo" in the second value of a document
// Using suggested syntax: ./[2].com.amazon.foo
(2 'com.amazon.foo')

// Violation at the field "foo" in the field "amazon" in the field "com" in the second value of a document
// Using suggested syntax: ./[2].com.amazon.foo
(2 com amazon foo)

// Violation at ./addresses[0].city
(addresses 0 city)

Which is the reason we moved on to use an SExpression instead for the representation of an Ion Path. Although this does not exactly follow Ion path extractor paths because Ion Path Extractor treats every top-level value as a separate root for matching, but Ion Schema treats a value or a stream of values(document) as a single root for validation. Also, it leverages the use of Ion Element and is unambiguous.

desaikd commented 1 year ago

Completed with https://github.com/amazon-ion/ion-schema-rust/pull/148