amazon-ion / ion-schema-rust

Rust implementation of Ion Schema
https://amazon-ion.github.io/ion-schema/sandbox
Apache License 2.0
12 stars 6 forks source link

Validation doesn’t produce violations in a deterministic order #206

Open jobarr-amzn opened 7 months ago

jobarr-amzn commented 7 months ago

As reported through another channel by an ion-schema-rust user:

#[test]
fn test_non_determinism() {
    use ion_rs::element::Element;
    use ion_schema::{
        system::SchemaSystem,
        isl::{
            IslSchema,
            isl_constraint::v_1_0::fields,
            isl_type::v_1_0::named_type,
            isl_type_reference::v_1_0::{
                variably_occurring_type_ref,
                named_type_ref,
            },
            isl_range::Range
        }
    };

    let element = Element::read_one("{str: 123, number: \"fox\"}").unwrap();

    assert!((0..20).map(|_| {
        let struct_type = named_type("my_type", vec![fields([
            ("str".into(), variably_occurring_type_ref(named_type_ref("string"), Range::required())),
            ("number".into(), variably_occurring_type_ref(named_type_ref("int"), Range::required())),
        ].into_iter())]);
        let isl_schema = IslSchema::schema_v_1_0("my_schema", vec![], vec![struct_type], vec![], vec![]);
        let mut schema_system = SchemaSystem::new(vec![]);
        let schema = schema_system.load_schema_from_isl_schema_v1_0(isl_schema).unwrap();
        let type_definition = schema.get_type("my_type").unwrap();
        let violation = type_definition.validate(&element).unwrap_err();
        violation
            .flattened_violations()
            .into_iter()
            .cloned()
            // Uncomment to make this test pass
            // .sorted_by(|left, right| left.message().cmp(right.message()))
            .collect::<Vec<_>>()
    }).all_equal());
}

As far as I can tell this is because HashMap is used to store field constraints and its hash function is randomly seeded.

jobarr-amzn commented 7 months ago

Version information for the above:

[[package]]
name = "ion-schema"
version = "0.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4b66b5bf004a664fce2936505d4fdced4e7e50f8e911c7306b9aa2d4c754eab"
dependencies = [
 "half",
 "ion-rs",
 "num-bigint 0.3.3",
 "num-traits",
 "regex 1.10.3",
 "thiserror",
]
popematt commented 6 months ago

As far as I can tell this is because HashMap is used to store field constraints and its hash function is randomly seeded.

I don't think there's anything wrong with this part. Ion Schema Language is a declarative rather than procedural language, the constraints are defined in an Ion struct (which is an unordered data structure), and Ion Schema makes no guarantees about the order in which constraints are evaluated.

The problem is that the order of execution is leaking into the user-facing output, and the solution should be to change the violations output to use a set instead of a Vec. If that is too much of a breaking change, then we can recommend that users map the violations to a set before comparing them.

Long term, we could provide some functions or macros to assist with writing tests for schemas.

Finally, this is concerning to me:

// Uncomment to make this test pass
// .sorted_by(|left, right| left.message().cmp(right.message()))

The violation messages are not guaranteed to be stable, so this is a very brittle comparison. It would be much better to check the source of the violation (i.e. which constraint produced this violation; and when relevant e.g. which list element in the input data).

desaikd commented 6 months ago

With #208, as a short-term solution added two macros assert_equivalent_violations and assert_non_equivalent_violations for Violation comparison. As per the discussion in #208, long-term solution would be to remove the tree structure of Violations.

I think the best long-term solution is to rewrite Violations to get rid of the tree structure completely.