XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

add test methods to check xml hierarchy #548

Closed lindsay-stevens closed 2 years ago

lindsay-stevens commented 2 years ago

Closes #448

Why is this the best possible solution? Were any other approaches considered?

It seems like about the best that can be done with the standard library ElementTree. The main limitation is that ETree doesn't support selecting attribute values e.g. for xpath_exact (link below). So the smallest xpath_exact match is the whole element with all attributes. But one could alternatively use attribute predicates and xpath_count.

https://docs.python.org/3/library/xml.etree.elementtree.html#supported-xpath-syntax

What are the regression risks?

None, these are optional new test methods. Existing tests can be migrated over gradually or as needed.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No.

Before submitting this PR, please make sure you have:

lindsay-stevens commented 2 years ago

I'll need to look into why the tests pass on Python 3.8 but not Python 3.7. Please add any feedback / comments in the meantime.

lindsay-stevens commented 2 years ago

Hi @lognaturel - all tests are green now so this is ready for a closer look (or merge).

About the workaround in 4c49f5a for Python 3.7 - it's ugly but perhaps this could be removed soon. There's #555 to add support for 3.9, and following #408 maybe 3.8 and 3.9 is enough Pythons to support?

yanokwa commented 2 years ago

@lindsay-stevens I am understanding you correctly that if we drop 3.7 support we'd be able to get rid of the monkey patching?

lindsay-stevens commented 2 years ago

Hi @yanokwa, yes the monkey-patch is only there for 3.7 compatibility.

lindsay-stevens commented 2 years ago

Hi @yanokwa or @lognaturel would this be OK to merge?

lognaturel commented 2 years ago

I haven’t had much computer time and unfortunately had to prioritize a couple of other projects. But this is on my list for early next week.

lognaturel commented 2 years ago

~I got stuck trying to review this previously because I didn't understand the new methods and I'm afraid I still don't. As far as I can tell, the examples in test_pyxform_test_case don't provide any additional expressivity over xml__contains. Could you please add some examples from the issue? In particular, in test_trigger, the big thing that's missing is the ability to test correct nesting. I'm not figuring out how to write xpath expressions to test that.~

Ok, I've figured it out. xml__xpath_exact=[(".//input[@ref='/trigger-column/a']//setvalue", ['<setvalue event="xforms-value-changed" ref="/trigger-column/b" value="now()"/>'])] does what I was looking for at test_handling_trigger_column_no_label_and_no_hint. @lindsay-stevens is that how you would expect that to be expressed or is there a better way? Maybe you could follow up with a PR that actually leverages xml__xpath_exact in test_trigger so we have some ideal example tests?

The signature seems confusing to me but I'm sure you had reasons for it. I'll merge this now and since it's test-only we can always iterate on it if needed.

lindsay-stevens commented 2 years ago

Thanks @lognaturel ! About the syntax structure:

About the trigger test example, the way you've written it with xml__xpath_exact would be the way to go if the entire setvalue node and it's attributes (verbatim) were expected. Another approach would be to use multiple xml__xpath_count test specs to find the setvalue node, and/or the attribute existence (or attribute value) for event, ref and value (so maybe 4+ test specs with different XPath expressions and each with count=1).

For example in the test TestPyxformTestCaseXmlXpath.test_xml__xpath_count__singular, has a test spec xml__xpath_count=[(".//instance[@id='ID']", 1)], to say that I expect to find one instance node whose id attribute value is ID.

lognaturel commented 2 years ago

Thanks for the extra context! I was expecting something like @MartijnR’s example which would be more like a list of paths that do or don’t resolve to a node.

I was surprised to have to specify both XPath paths and text to match. I think a downside is that we do have to know what order attributes serialize in and just like with the existing text match methods, if that order changes, tests will fail. What led you to the tuple approach rather than just match/no match on XPath expressions? That would be more like your count method with a count of 1. Do you have concrete applications for a count greater that 1? I haven’t been able to think of one.

For some reason I also got tripped up by having to root the expression at . so it’s relative.

lindsay-stevens commented 2 years ago

Hi @lognaturel I could add the following alternative in a follow up PR. Example test adapted from test_area.py.

    def test_area(self):
        self.assertPyxformXform(
            name="area", md=MD,
            # old way
            xml__contains=["""
                <bind calculate="enclosed-area( /area/geoshape1 )" nodeset="/area/result" type="string"/>
            """],
            # new way (maybe)
            xml__xpath_match=[
                ".//bind[@calculate='enclosed-area( /area/geoshape1 )'][@nodeset='/area/result'][@type='string']",
                ".//some/other/xpath/here"
            ],
        )

ElementTree has a weird way of doing compound predicates. One expression per predicate allowed, and putting them one after the other is equivalent to anding them. So in the above node[expr][expr] means node[expr and expr] (ref). I don't think there is a way to write or, but perhaps that's not needed.

The above example assumes that the xml__xpath_match kwarg accepts a list of strings which are XPath expressions, and the test passes if the result is exactly one matching node. The example compound predicate results in a test that checks for the node, and all 3 attributes (any order), and the attribute values. It could also be written as multiple .//bind[expr] XPaths. There would also be a negative assertion kwarg xml__xpath_not_match.

I had a look at whether the situation is any better with the lxml library but it won't be useful since it does not support XPath expressions for nodes in the default namespace (link). The API that does support it has reduced XPath support (more like ETree). Most XForm nodes are in the default namespace (http://www.w3.org/2002/xforms). There is a lxml package for CSS selectors but I would be wary to add that since XForms seem more aligned with XPath, and it may also have a similar (or other) problems.

About the tuple structure, the general idea was to expand on how the existing methods look for for XML/node fragments to also make assertions about where the node is located. If appropriate to the test, refactoring to xml__xpath_exact could copy over old and complex content assertions but add an XPath to specify where it should be found. For simpler content assertions there's xml__xpath_count. Refactoring to the above xml__xpath_match would go a bit further, to (re)write both content and location assertions.

Having to specify a count integer for xml__xpath_count is a bit of a drag, but it is to cope with multiple matches, and allow for a negative assertion (count 0) without an extra kwarg e.g. xml__xpath_not_match. Given the limited XPath support in ETree I anticipate having tests where the XPath can't be written in a way that avoids duplicate matches. For example in tests around allow_duplicate_choices, adjacent <item> nodes the same <value> children but different <label> children.