KWARC / rust-libxml

Rust wrapper for libxml2
https://crates.io/crates/libxml
MIT License
76 stars 38 forks source link

All `StructuredError` returned by `SchemaValidationContext::validate_*` are identical #115

Closed JDSeiler closed 1 year ago

JDSeiler commented 1 year ago

Description

When multiple validation issues are present in a document, all of the StructuredError objects returned by validate_document or validate_file are the exact same object in memory.

The unit test for schema validation seems to suggest that this is expected behavior, but other FFI wrappers around libxml (such as: https://github.com/marudor/libxmljs2/wiki#validating-against-xsd-schema) do not exhibit this behavior. I was hoping to learn more about why this is happening and if there is any way to get each unique schema validation issue that's present.

Reproduction

Change the following code in tests/schema_tests.rs, https://github.com/KWARC/rust-libxml/blob/master/tests/schema_tests.rs#L93-L98 to the following:

for err in &errors {
  println!("{:#?}", err);
  assert_eq!(
    "Element 'bad': This element is not expected. Expected is ( to ).\n",
    err.message()
  );
}

Then run the tests using cargo test -- --nocapture. In the test output, you should see something like:

     Running tests/schema_tests.rs (target/debug/deps/schema_tests-b2c35c81d74ebf18)

running 2 tests
StructuredError(
    0x0000000120e05748,
)
StructuredError(
    0x0000000120e05748,
)
StructuredError(
    0x0000000120e05748,
)
StructuredError(
    0x0000000120e05748,
)
StructuredError(
    0x0000000120e05748,
)
test schema_from_string ... ok
test schema_from_string_generates_errors ... ok

Expected vs Actual Behavior

A unique StructuredError object for each unique validation issue present in the document. Instead, a Vec containing the same repeated StructuredError is returned.

System Information

dginev commented 1 year ago

@JDSeiler I believe the short story here is that we've had very minimal development effort spent on the validation features of libxml.

We can likely improve in multiple directions here, and PRs from developers who are interested in validation workflows are most welcome.

JDSeiler commented 1 year ago

@JDSeiler I believe the short story here is that we've had very minimal development effort spent on the validation features of libxml.

We can likely improve in multiple directions here, and PRs from developers who are interested in validation workflows are most welcome.

I see, thanks for the quick response!

I'd be happy to look into it, though I can't make any promises about getting any results, since I have practically zero experience with C, FFI, unsafe Rust, etc. Maybe this is my time to learn 🤷

I am also realizing that I completely got my wires crossed and my reproduction steps are wrong. My apologies! Only one validation error is being returned (which I suppose is separate issue, since clearly more than one error is present in the example), but the same document is being validated multiple times.

I am seeing the duplication behavior in some separate proof-of-concept code where I'm consuming this crate. If seeing that code would be helpful I can share it here, but otherwise I'll leave the issue as is.

JDSeiler commented 1 year ago

Hey there @dginev, I spent some time on this and I've got some changes that I think are ready for preliminary feedback, if you'd be interested to take a look: https://github.com/KWARC/rust-libxml/pull/116