XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

Fixed ${} references in predicates not being expanded using current()… #536

Closed gushil closed 3 years ago

gushil commented 3 years ago

… if the expression contains whitespace before the reference

Closes #531

Why is this the best possible solution? Were any other approaches considered?

Fixing previous working solution.

What are the regression risks?

None.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No.

Before submitting this PR, please make sure you have:

gushil commented 3 years ago

I could not break it, though I tried quite hard. :)

Thanks!

gushil commented 3 years ago

Hi @MartijnR should I make changes or we just wait for @lognaturel to chime in here?

Thanks

pbowen-oc commented 3 years ago

@lognaturel - Will you be able to review this in the next couple days? We are working on a release to fix this and other issues and looking to get a tagged version that we can deploy for testing as soon as possible.

lognaturel commented 3 years ago

Will try to review and release by Fri.

@gushil please review #535. Could you also please look into why there’s a CI failure here? Have some versions changed? We want to make sure it’s green before merge.

gushil commented 3 years ago

@lognaturel

It looks like the failure is caused by pip version 21.1 (used by the build)

Based on https://github.com/pypa/pip/issues/9954 , by using pip version 21.1.1 (released on May 1st, a couple of days after the build happened) the warnings will be silenced.

Should we try to trigger new build? (I don't know how to do that yet)

codecov-commenter commented 3 years ago

Codecov Report

Merging #536 (91e30e2) into master (ae7777c) will increase coverage by 0.01%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #536      +/-   ##
==========================================
+ Coverage   83.98%   84.00%   +0.01%     
==========================================
  Files          25       25              
  Lines        3728     3732       +4     
  Branches      871      872       +1     
==========================================
+ Hits         3131     3135       +4     
  Misses        452      452              
  Partials      145      145              
Impacted Files Coverage Δ
pyxform/survey.py 92.73% <100.00%> (+0.05%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ae7777c...91e30e2. Read the comment docs.

lognaturel commented 3 years ago

As far as I know, only @ukanga has access to the Appveyor account. When I need to re-run, I force push a change to the latest commit message. I did this here and did not change the contents of the commit.

What I'm understanding here is that #524 introduced the regular expression instance\([^)]+\S+ to identify instance path expressions. This explicitly used whitespace to attempt to identify the end of a path expression starting with instance but I didn't really notice that or consider implications in review. All tests introduced made sure there was always a space after the instance path expression and not within it. This is the kind of assumption that's really valuable to describe in a PR description to make sure it's explicitly considered.

@pbowen-oc's example from the issue is particular because it has multiple predicates. I think that's important to introduce some testing around. I would have been tempted to identify the end of an expression we care about by starting with instance( and ending with the first ] but the possibility of multiple predicates mean we can't do that. There should be a test that keeps us from trying that in the future.

What this PR does is remove any attempt to identify the end of an instance path expression. This kind of thing would be valuable to highlight in the "why is this the best solution" part of the PR template. This change means that any ${} expression that occurs after instance() will get a current() prefix whether or not it's in a predicate. See for example instance('item')/root/item[itemindex = ${item-counter}]/label + ${item-counter}. The first ${item-counter} gets a current() prefix which is good and expected. The second ${item-counter} does too. This is not desirable and I don't love it but it happens that the current() has no effect in that case so it 's not the worst thing in the world. I can't think of a case where the superfluous current() would cause problems. @MartijnR, what do you think? I don't believe we can accurately identify XPath path expressions without a parser that has knowledge of operators, function structure, opening closing brackets/braces, etc. This may be the best we can do. If we go with it, I think this case should be documented with a comment that says something like "The second reference to item-count should not get a current() prefix but we currently have no way of reliably detecting the end of an XPath path expression so any reference after instance() gets a current() prefix whether or not it's in the path expression."

We don't have to do anything about this now, but I wanted to come back to something from the initial issue that I think we didn't get quite right. At https://github.com/XLSForm/pyxform/issues/490#issuecomment-785459282, @MartijnR, you said we should only be using current() for XPath path expressions that start with instance(. I'm not really sure that's the ideal. A common thing to do is to have two parallel repeats. For example, one that builds a roster and another that asks questions about each roster member. You might want to do something like /data/roster[position() = ${pos}]/name in the second repeat. In that case you do need the current() prefix. In fact, I can't currently think of a case where it would be undesirable. You mentioned populating choices from a repeat but that's not done manually so the path expression for that would not go through this code.

lognaturel commented 3 years ago

Hmm. Now that I think about it with a clear head, wouldn't a regex that captures all strings in between [ and ] and uses the current() prefix for those have the outcome that we want? This depends on whether or not I missed something regarding predicates on nodesets in the forms.

@pbowen-oc @gushil I think this PR certainly makes a valuable improvement. If you can do the two second bullets I mentioned, I could merge and do a release. Then once we get an opportunity to hear from @MartijnR we can evaluate whether the concept I mentioned above might be promising and help capture more useful cases.

pbowen-oc commented 3 years ago

Thanks, @lognaturel.

@gushil is working on adding tests for the cases you referenced

MartijnR commented 3 years ago

wouldn't a regex that captures all strings in between [ and ] and uses the current() prefix for those have the outcome that we want?

That sounds great. I agree it's better not to create unnecessary current() prefixes, and I'm embarrassed I didn't find instance('item')/root/item[itemindex = ${item-counter}]/label + ${item-counter}, because that's exactly the thing I was trying to find..... 👍

You might want to do something like /data/roster[position() = ${pos}]/name in the second repeat. In that case you do need the current() prefix. In fact, I can't currently think of a case where it would be undesirable. You mentioned populating choices from a repeat but that's not done manually so the path expression for that would not go through this code.

Hmmm, yes, I have a vague recollection that you mentioned this about my misunderstanding of current() before... I think I may have regressed and that you're right about current() being appropriate for all cases where a ${ref} results in a relative path inside a predicate (so in pyxform world).

lognaturel commented 3 years ago

Will aim to release mid-morning tomorrow (Pacific).