XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
80 stars 136 forks source link

495: Default string values with dashes are mistakenly treated as dynamic defaults #578

Closed lindsay-stevens closed 2 years ago

lindsay-stevens commented 2 years ago

Closes #495

Why is this the best possible solution? Were any other approaches considered?

Approaches the problem by adding a lexer, whose parsing rules are used to identify expression tokens that indicate a dynamic expression. These tokens are: math operators, union operator, xpath predicates, pyxform references, and function calls. There may be others but the lexer provides a framework to add to the parsing rules, or the business logic using them.

As mentioned in #495, a regular regex isn't adequate to identify the various tokens that indicate a dynamic expression. I looked for "simple" lexer libraries that could parse XPath in Python. Although that isn't really what is needed, there were some big hints in euxml for identifying common tokens and especially valid XML/XPath names. Also these searches revealed the presence of a regex scanner in the Python standard library, which is used here. The euxml library uses PLY but it seemed like overkill (if not already) to learn and bring in a dependency for this heuristic check.

Includes tests to reproduce the issues described in 495 and 533, as well as a range of cases which should / should not be considered dynamic. Some of these overlap with existing tests but I didn't want to go as far as refactoring them until I got some feedback on this approach.

What are the regression risks?

There currently aren't tests to benchmark performance for huge forms full of complex default expressions. I found that using a shared / pre-compiled scanner speeds things up signficantly vs. making a new one for each question, but I'm not sure on the remaining overhead for huge forms.

It's possible that some cases where an expression is or is not a default may be missed. I think this PR covers the all known cases so it would not be a regression at least.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Probably not, but it might be worth making a note of how Pyxform decides what is dynamic, to preempt forum questions.

Before submitting this PR, please make sure you have:

lindsay-stevens commented 2 years ago

Hi @lognaturel there may be some more to do on this wrt. tests but I think it is ready for a look. Thanks!

lindsay-stevens commented 2 years ago

Thanks for the review! I will update the tests as mentioned above to complete this PR.

lindsay-stevens commented 2 years ago

Should be good to go now. New commit is all about tests, per commit message for ba7b196:

The performance tests are:

lognaturel commented 2 years ago

🥳 🥳 🥳