XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

${} references in predicates not being expanded using current() if the expression contains a space before the reference #531

Closed pbowen-oc closed 3 years ago

pbowen-oc commented 3 years ago

Software and hardware versions

pyxform v1.5.0

Problem description

Item references in instance() expressions are not being expanded using current() if the expression has a space preceding the item reference at any point. The update for issue #490 did not address this issue fully.

Steps to reproduce the problem

Create a form with an instance() expression containing an item reference in a predicate. If the expression contains a space before the reference, it is not expanded with current and will therefore not work as intended.

Test form with several cases: instance reference test_small.xlsx

Expected behavior

All item references in instance() predicates will be expanded using current().

Other information

I tried the attached test form in the online converter and it returns an error: "XPath evaluation: unsupported construct [filter expression]".

I converted the form internally with the validation step turned off and see that the XForm is converted with the expressions like this: calculate="instance('clinicaldata')/ODM/ClinicalData/SubjectData/StudyEventData[@StudyEventOID='SE_E1'][position()= current()/../int1 ]/FormData[@FormOID='F_F1']/ItemGroupData[@OpenClinica:ItemGroupName='group1'][position() = ../int1 ]/ItemData[@OpenClinica:ItemName='item1']/@Value"

The first reference to item int1 is expanded as expected, but the second reference (after the expression has contained a space) is missing "current()/".

pbowen-oc commented 3 years ago

@gushil - Please look into this issue.

pbowen-oc commented 3 years ago

It seems like any item reference that is within square brackets [] should be expanded in this manner, but I don't know if there is more to it than that.

lognaturel commented 3 years ago

any item reference that is within square brackets [] should be expanded in this manner

There can also be a predicate on a nodeset expression within the primary instance. In that case typically the expression is intended to be in the context of the nodeset expression. /data/repeat[${age} > 12] would be expected to expand to data/repeat[age > 12]. This has worked in the past so I don't think we want to change it. Also, we generally don't want users to have to differentiate between age and ${age}.

I think "any reference in a path expression that starts with instance(" would give the desired behavior. I suspect the problem might be that the end of a path expression is identified by a space. It might be the case that also using [] would get the desired behavior. This is all a lot more convoluted without a proper parser (as opposed to disconnected regex). Do you have some ideas, @gushil?

gushil commented 3 years ago

Yes, that is the problem (regex is disconnected if there is space found in the expression) I found.

From the test case (test_repeat_with_reference_path_in_predicate_uses_current) I found this: instance('item')/root/item[index=${pos5} and ${pos1} = 1]/label The result would be: ${pos5} will be expanded to current()/../pos5 ${pos1} will be expanded to /data/pos1

Should we check reference path that is begins with instance and has operators before it?

note: I asked @pbowen-oc and operators that is possible are: =, !=, >, <, >=, <=, +, -, *, div, and, or

lognaturel commented 3 years ago

Is there currently a regular expression that will correctly return all of instance('item')/root/item[index=${pos5} and ${pos1} = 1]/label? If so, then you should be able to find all matches for ${...} within that. Note that you'd also have to add tests for cases like @pbowen-oc's with multiple predicates.

If not, I'm not as sure. Explicitly testing for all of the operators really seems like a pain. It also wouldn't catch diabolical things like index=sum( ${field} ). It might be time to look at properly parsing the full expression to identify individual path expressions.

gushil commented 3 years ago

Is there currently a regular expression that will correctly return all of instance('item')/root/item[index=${pos5} and ${pos1} = 1]/label? If so, then you should be able to find all matches for ${...} within that. Note that you'd also have to add tests for cases like @pbowen-oc's with multiple predicates.

Yes, there is one. I've tried that one and successfully return all the string. Is the test function I mentioned before is wrong? If I use the regex that return all the string, I got ${pos1} too. Should ${pos1} expanded to current()/../pos1 ?

If not, I'm not as sure. Explicitly testing for all of the operators really seems like a pain. It also wouldn't catch diabolical things like index=sum( ${field} ). It might be time to look at properly parsing the full expression to identify individual path expressions.

Yeah, that is what I've been thinking. Let me do the experiment with regular expression more.

lognaturel commented 3 years ago

I've tried that one and successfully return all the string.

Oh good, that sounds promising!

Should ${pos1} expanded to current()/../pos1

Yes.

gushil commented 3 years ago

@lognaturel I created PR #536 to fix the issue.

Waiting for review/resolve.

Thanks!

gushil commented 3 years ago

@lognaturel @MartijnR

Just bump this issue (with accompanied PR) in case the notification is missing.

Thanks.

MartijnR commented 3 years ago

Ok, thanks. I'll try to break it!