Closed MartijnR closed 3 years ago
Hi @MartijnR
What does *
means for max
, min
, and concat
? Is that means all arguments for those functions should be returned as absolute?
Thanks
Sorry, I should have clarified that and added the source. This means that all arguments are node-sets. Feel free to use something else though of course.
See source here: https://getodk.github.io/xforms-spec/#xpath-functions
{ "count": [0], "count-non-empty": [0], "sum": [0], "max": ["*"], "min": ["*"], "concat": ["*"], "join": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], "randomize": [0], "indexed-repeat": [1, 2, 4, 6], "area": [0], "distance": [0] }
Hi @MartijnR, I have working code now but I want to confirm these. Those are the xpath functions and list of parameter positions that are node-sets in array.
So, for example, if ${item} is found as first (or second, or fourth, or sixth) parameter of indexed-repeat, ${item} will be returned as absolute path. Is that correct?
If that is correct, I think for count
, count-non-empty
, sum
, randomize
, area
, and distance
, the indices should be 1
, not 0
, right? (because the first parameter of those functions that will be returned as absolute path)
I'm using something like this:
xpath_functions = {
"count": [1],
"count-non-empty": [1],
"sum": [1],
"max": ["*"],
"min": ["*"],
"concat": ["*"],
"join": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
"randomize": [1],
"indexed-repeat": [1, 2, 4, 6],
"area": [1],
"distance": [1]
}
@gushil, I used 0-based indices, but made a mistake with indexed-repeat. I've corrected it above. Best to check my work on the others as well with the documentation of those functions.
Hi @MartijnR I've created PR #528 to solve this issue.
Waiting for some review.
Thanks.
Sorry I missed this. I'm afraid I'm not in favor of this change!
First, it breaks existing forms that use aggregate functions on nested repeats. This contexts seems like a very good reason to use aggregate functions inside a repeat and is a common pattern for those who choose to use nested repeats. Consider this test:
def test_count_of_nested_repeat_from_outer_repeat_produces_relative_ref(self,):
self.assertPyxformXform(
name="data",
md="""
| survey | | | | |
| | type | name | label | calculation |
| | begin_repeat | repeat1 | | |
| | begin_repeat | repeat2 | | |
| | text | foo | Foo | |
| | end_repeat | repeat2 | | |
| | integer | count1 | count1 | count(${repeat2}) |
| | end repeat | | | |
""", # noqa pylint: disable=line-too-long
xml__contains=[
"""<bind calculate="count( ../repeat2 )" nodeset="/data/repeat1/count1" type="int"/>"""
],
)
Second, specifying an aggregate function over a repeat's value(s) from within that repeat is inefficient and feels semantically strange to me. It's also explicitly not supported by ODK Collect because of the performance issue. I always tell form designers to introduce a calculate outside their repeats to do this. In other words, is there any advantage to
type | name | label | calculation |
---|---|---|---|
begin repeat | repeat | ||
integer | item1 | item1 | |
integer | count1 | count1 | count(${item1}) |
end repeat | repeat |
over
type | name | label | calculation |
---|---|---|---|
integer | count1 | count1 | count(${item1}) |
begin repeat | repeat | ||
integer | item1 | item1 | |
end repeat | repeat |
Couple other thoughts that came to mind...
The indexed-repeat
issue feels different because its spec requires absolute refs and then indexes are explicitly specified for each path segment.
The existing pyxform
behavior is the same it's always been, right? That is, with the original interpretation of absolute paths within repeats, those absolute paths were contextualized according to the repeat instances they were in. E.g. /data/repeat/item1
from within /data/repeat[3]
would be contextualized to /data/repeat[3]/item1
. When we switched to relative references, that same interpretation continued. Does that sound right?
When we switched to relative references, that same interpretation continued. Does that sound right?
No, not in Enketo. This welcome change finally allowed us to evaluate XPath correctly without injecting positions!
First, it breaks existing forms that use aggregate functions on nested repeats.
Yes, indeed. I forgot about nested repeats (I have a tendency to do that ;)). So, it could continue to use a relative path but if used inside the same repeat instance the argument refers to, it could step outside of that repeat (like in #503)?
Second, specifying an aggregate function over a repeat's value(s) from within that repeat is inefficient and feels semantically strange to me. It's also explicitly not supported by ODK Collect because of the performance issue.
Certainly a weird use, but I have no doubt @pbowen-oc has some fine examples that use this... If there aren't good use cases, this issue becomes unimportant indeed.
finally allowed us to evaluate XPath correctly without injecting positions
I do think usually folks want relative expressions from within repeats. I understand that wasn't practical to maintain for Enketo so it's good the spec change was made but what I'm saying is that pyxform
maintained the same client behavior across that change. Coming back again to ${item1}
referenced from a node within /data/repeat[3]
, my understanding is that:
pyxform
expanded ${item} to /data/repeat/item
which was interpreted by clients as /data/repeat[3]/item
pyxform
now expands ${item} to ../item
which is interpreted by clients as /data/repeat[3]/item
If I understand correctly, you're proposing changing the expansion to /data/repeat/item
(or ../../repeat/item
) assuming that the user's intent is to reference the nodeset of all items
. I'm not convinced that's really generally the intent and I don't like that existing XLSForms would be converted to XForms with different semantics. Does that make sense?
Unless I'm mistaken, in the nested repeat case, there is a difference between the example I showed in the test and what would happen if we changed the expansion to ../../repeat1/repeat2
. That is, in my test, we're getting the nodeset of the nested repeat for the current instance of the outer repeat. In my experience, that's what users want (e.g. compute the number of people in the household). I believe ../../repeat1/repeat2
would be the nodeset of all instances of the inner repeat for all instances of the outer repeat (e.g. the number of people across all households). I've never seen users want to do this but if they did, pulling it outside of both repeats would make a lot more sense, I think (they could still reference it from within either repeat). What I do see is people using the nodeset of the nested repeat to compute something (usually just a count) and then using the nodeset of those computed values from the form root to get e.g. the total count.
@lognaturel @MartijnR - We don't work with nested repeats here, so I can't comment on the functionality of those.
I am not sure how these functions worked for us before when using the older version of Pyxform with absolute references. I was assuming there was a change in the form functionality with the switch to relative references as there had been with indexed-repeat(), but I don't know that for sure.
I agree that it would be more efficient to put one of these nodeset functions outside the repeating group and then reference that value as needed within the repeat. I'm not aware of any use case where having the nodeset function within the repeat would not work as desired from outside the repeat. Assuming that is the case, I don't think I see a strong need to update the behavior as described here.
This change was not meant to change behavior of existing (newly transformed) XLSForms. When a user uses a function such as count() or sum() it's very unlikely the user would want to include only one ${item1} node instead of all of them.
However, if there is no use case for this...., then indeed we shouldn't bother at all! Thanks for questioning the need for this added complexity @lognaturel.
it's very unlikely the user would want to include only one ${item1} node instead of all of them
I do agree that your interpretation of intent is likely correct in a single-repeat context. But I'm very happy to keep "forcing" people to put these computations outside the repeat! Let's say you do have count(/data/repeat/item1)
inside /data/repeat
, does Enketo actually recompute every single repeat instance and need to iterate over every repeat instance to do so or does it do something smart to factor out the computation?
Problem description
node-set arguments are converted to relative paths if the argument node(s) is/are inside a repeat and referred from the same repeat
Steps to reproduce the problem
Expected behavior
All
item1
should output as/data/repeat/item1
, and not as../item1
.Other information
This is exactly the same problem as the already fixed https://github.com/XLSForm/pyxform/issues/484 for
indexed-repeat()
, so this fix should probably be integrated with that fix.Perhaps it would be helpful to maintain an object of XPath functions that contain node-set parameters and list the parameter indices that are node-sets in arrays, e.g.
I believe
join()
can have an unlimited number of parameters, and only the first is not a node-set. So perhaps that should be handled separately, or a syntax can be invented for 1+. Or just not worry about more than 20, since that's very likely to never be used.