Closed butlerpd closed 3 months ago
The culprit appears to be a change in the schema validation function in libxml2 that lxml uses. I had noted earlier that the tests were still passing in Debian even though they were failing in Ubuntu - that is no longer the case.
test/sasdataloader/utest_cansas.py::cansas_reader_xml::test_invalid_cansas
fails with libxml2 from unstable (2.12) but passes with libxml2 in testing (2.9).
The xsd [1] for recognising partly broken cansas files is to blame - the multiple xsd:any
entries in the definition for SASentryType
(line 70) make it ambiguous, when there are three groups in the sequence (any
, SASdata
, any
), there are multiple ways to divide the sequence.
[1] sasdata/dataloader/readers/schema/cansas1d_invalid_v1_0.xsd
A test to run outside of the test harness is:
xmllint --noout \
--schema sasdata/dataloader/readers/schema/cansas1d_invalid_v1_0.xsd \
test/sasdataloader/data/cansas1d_notitle.xml
A namespace warning is OK; a "content model is not determinist" error or a "Schemas validity error" is not.
As of Feb 12, 2024, lxml is being pinned to versions below 5.0 (see PR #63) due to failing unit tests for Mac OS and Ubuntu. The root cause needs to be investigated so that this restriction can be removed to avoid permanently pinning to something that becomes an ancient version.