XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
80 stars 136 forks source link

Default string values with dashes are mistakenly treated as dynamic defaults #495

Closed lognaturel closed 2 years ago

lognaturel commented 3 years ago

Software and hardware versions

Since dynamic default support was introduced.

Problem description

Dynamic default support uses a regular expression that looks for - and other operators. This can lead to bad behavior when a literal string with dashes is used as a default. This is particularly likely when setting a default for a select because it seems form designers favor dashes over underscores for separating words in choice names (e.g. a-choice). @yanokwa shared another example at https://github.com/XLSForm/pyxform/issues/533

Steps to reproduce the problem

Create a form with a dynamic default that has a dash in it meant to be part of a string. For example, this one. See that the resulting XML when converting has a setvalue when the value should be used literally.

Expected behavior

The literal default value should be set in the instance.

Other information

I wasn't too worried about the regular expression when we initially did this because I was thinking that literal values would work fine with setvalue. However, literal values in XPath must be surrounded by quotes. I'm not sure what the best solution is.

MartijnR commented 3 years ago

Just some brainstorming.

In a-choice, a and choice operands would have to either be a number or a path.

If the operand is not a number, like this case, I think it would have to start with either / followed by [a-zA-Z] or ../ followed by [a-zA-Z].

Only descendants of the current node could start with other characters/sequences (e.g. a or ./ or // or choice), but since questions in XForms don't have any children, that won't happen.

Thankfully we don't support XPath Axes notation...

lognaturel commented 3 years ago

This can also happen with raw geopoint values used as defaults (e.g. 32.7377112 -117.1288399 14 5.01). We could possibly make that and geotrace|shape a special case since we know the format.

A workaround is to wrap the value in a call on the string() function.

lognaturel commented 3 years ago

Fun times, 1-1 is a valid choice name! 🎉😭 I think that's a fine case to let go through as an incorrect dynamic default and if someone asks about it we can tell them to use string(1-1) instead. Same if someone wants to set a default of 245 / 6, for example.

lindsay-stevens commented 2 years ago

Is the above string() workaround sufficient to close this with a documentation update (to mention the workaround)?

Alternatively, how about adding a parameter e.g. default_is_literal=True which defaults to False? When True, this parameter would result in the default column value being inserted as-is into the instance data. When false, it is processed into the setvalue element. If it seems like most users would want a literal default then the parameter could default to True instead of False. There could be different defaults for this parameter for different question types, e.g. True for text, select; False for calculate, integer.

The current implementation (which centers on the function default_is_dynamic) has list of operator tokens (such as +, -, mod) which are checked against the default value string with in. No fancy regex lookahead / lookbehind or AST stuff. This could be removed if the above parameter is added.

lindsay-stevens commented 2 years ago

I've added draft PR #578 with tests to reproduce the issues described in 495 and 533.

lognaturel commented 2 years ago

If you can, @lindsay-stevens, I think it's worth you timeboxing an hour to see if you can come up with a simple way to improve cases like an-option, some+text+with+pluses, and/or based on @MartijnR's heuristics without compromising too much on complexity. If not, having an ignored test as in #578 is great. We should also address https://github.com/getodk/docs/issues/1343 sooner than later on the docs side.

For locations, what do you think of doing a similar exception as for dates? That is, if the field type is geopoint, geotrace or geoshape, remove - from the list of characters used to identify dynamic values?

lindsay-stevens commented 2 years ago

@lognaturel What I had in mind with the parameter suggestion is that using a heuristic for this seems like the kind of thing that may keep generating new and interesting edge cases after the immediate cases are addressed. By immediate cases I mean a default using an alphanumeric choice with a dash like a-choice or 1-1, or a string such as a URL with a dash.

I made an attempt and so far I have the below regex to identify 1 operator and 2 operands (integer or decimal), so that if it parses out to more than 1 match group it's probably a dynamic expression. Not quite the same heuristic but it's on the way there in terms of identifying operands. It would probably become intractable as a regex if it needed to consider compound / bracketed expressions.

(\d+?[\.\,]?\d*?)\s*?([\*\|\+\-\[\]{}\(\)]|\s+?mod\s+?|\s+?div\s+?)\s*?(\d+?[\.\,]?\d*?)

If the check function default_is_dynamic receives the question choice names it can ignore defaults which exactly match one of the choice names. This change and the above regex allow the tests in #578 to pass, but sadly they break many other tests in test_dynamic_default.py.

If you'd like to proceed with addressing these immediate cases please let me know and I'll work on the above further for the PR. In that case could you please also provide an example default value for geotrace and geoshape? I'm not familiar with these question types and couldn't find an example in the tests / docs.