XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
82 stars 136 forks source link

Output `no-repeat-reference-error` in some manner or the pyxform version or the xforms version #393

Closed MartijnR closed 4 years ago

MartijnR commented 4 years ago

For discussion.

Some advanced XPath enthusiasts (@ln) are running into issues caused by Enketo's workaround for a major XForm syntax issue: https://opendatakit.github.io/xforms-spec/#a-big-deviation-with-xforms.

Pyxform recently started generating relative repeat node references, so we're on the verge of being able to remove this nasty deviation from the ODK XForms spec (I think).

However, since we still need to be able to run forms that were transformed by older versions of pyxform, it will be many years before we can remove the code workarounds in Enketo and Collect. This means it is a challenge, for Enketo, to facilitate support for those very cool advanced forms.

This may be a good reason to add some versioning or featuring in the XForms output. E.g. the output could contain a no-repeat-reference-error class or a pyxform-version attribute. Data collection clients could disable the bug-causing workaround for the buggy XForms syntax based on the presence of this class or the value of the attribute.

I imagine there will be more valid use cases in the future, so perhaps a version attribute is the easiest?

lognaturel commented 4 years ago

Yes, I think we need to do this yesterday.

I much prefer using a general version attribute. I'd be in favor of doing what we do on other ODK tools and using a commit hash to represent version. That way the versioning can be automated, is guaranteed to be unique, and those of us running off master will have exact information about the state of pyxform when the conversion occurred.

XPath Enthusiasts Unite. 🤘

MartijnR commented 4 years ago

Great! Automating would be ideal. The only issue with a hash is how to do a version > xxx check in the data collection client, which is the use case I'm thinking off.

In Enketo - nodeJS - we do execSync( 'git describe --tags', { encoding: 'utf-8' } ).trim(), to use the latest git tag + additional untagged commits. Would that do?

MartijnR commented 4 years ago

Would this be a good location , attribute name, namespace?

<model>
     <instance>
            <data id="mysurvey" orx:version="2014083101" odk:pyxform-version=".....">
             .....
     </instance>
</model>
lognaturel commented 4 years ago

The only issue with a hash is how to do a version > xxx check in the data collection client, which is the use case I'm thinking off.

Ah, indeed. We do git describe --tags on the other Collect tools as well. Perhaps we could consider the total number of commits on the current branch (git rev-list --count HEAD)? I think that would work perfectly for official releases but it would be harder to trace for interim branches. Seems ok to me.

Or perhaps hash-commit count (so git rev-parse --short HEAD-git rev-list --count HEAD -- the current "version" would be 5542ae4-1263). That's redundant for official builds but makes any build traceable.

MartijnR commented 4 years ago

Thanks!

I believe git describe --tags could be reliably used with parseFloat() until the major (x.x) version and that would be sufficient. But missing out on the x.x.x patch releases might sometimes be a small issue.

I wonder how likely it is that an organization runs a fork (or deploys a branch) with a minor customization (though perhaps with lots of commits for it). The advantages of using tags is that that may be easier to deal with since the tag we're looking for has the feature.

lognaturel commented 4 years ago

I believe git describe --tags could be reliably used with parseFloat() until the major (x.x) version and that would be sufficient.

Fair enough. Probably clients would want to use something like a regex to pull the three digits because 1.15 is greater than 1.2. And that takes care of the patch releases as well.

I realize I confused things initially by saying "using a commit hash to represent version" when I really meant git describe --tags. I think we've been in agreement this whole time.

I wonder how likely it is that an organization runs a fork (or deploys a branch) with a minor customization

We of course don't know for sure but I'd guess it's not common. Experience suggests that it's pretty hard to maintain a fork up to date because of the structure of the code so either folks have wandered off entirely or they're on core.

lognaturel commented 4 years ago

Process-wise, where do we go from here? I think the odk:pyxform-version= addition is small and non-controversial enough that it can just be filed on the ODK XForms repo and made so as long as @tiritea is on board.

Does anyone specific need to comment on the nature of the version provided? Presumably @ukanga should sanity check at the very least. Can you do the implementation, @MartijnR?

MartijnR commented 4 years ago

that it can just be filed on the ODK XForms repo and made so as long as @tiritea is on board.

Yes, I agree.

Can you do the implementation, @MartijnR?

I was going to say "no, I don't speak python", but now I feel embarrassed and challenged. I am going to try today! (and will add @ukanga, and you as reviewers - we can always change the version algorithm in the PR if somebody has an argument to do so).

tiritea commented 4 years ago

Explicitly version-tagging ODK XForms seems to make a lot of sense. However I'm not sure this should be against the particular version of pyxform (possibly?) used to generate it; wouldn't it make more sense to tag it with an appropriate ODK XForms spec version instead? [didnt we discuss versioning the ODK spec at some point? maybe now's a good time...].

For example, I often manually write certain XForms - how would I know what odk:pyxform-version=??? to put in?

How about odk:xforms-version= ?

[BTW, I'm tempted to nominate no-repeat-reference-error for prize of "Most Abstruse XML Element Attribute'... ;-) ]

MartijnR commented 4 years ago

BTW, I'm tempted to nominate no-repeat-reference-error for prize of "Most Abstruse XML Element Attribute'... ;-)

Haha, I actually (truly) think that attribute is very self-explanatory, but am delighted that you added the word "abstruse" to my vocabulary, and you make a good point about versioning.

This trigger issue for versioning demonstrates a small problem with linking to an XForms Spec version. The use case that triggers this versioning isn't directly linked to an XForms spec change. Using relative references to refer to nodes inside a repeat has always been allowed and supported. Enketo would just like to know if another deprecated (absolute path) usage is guaranteed to not be used in this form.

However, I agree this solution is very pyxform-centric and that is not ideal. We could workaround the xforms-spec issue by removing the incorrect absolute-path usage now (at more or less the same time as we introduced the correct syntax in pyxform) and generate a new XForms spec version. Hmmm.

MartijnR commented 4 years ago

How about odk:xforms-version= ?

I think @tiritea's solution is better as it is form-builder agnostic. I agree with the proposed attribute. The relative-paths in repeat refs are a bit of a special case. Generally, linking to an XForms version is probably a good solution to detect XForms changes that cannot easily be determined from analyzing the XForm content itself (which I think would normally be my preferred approach).

If we name the current XForms version "1.0.0"that could be the first version we output in pyxform.

It is possible (and likely) that an XForms feature has no implementation in pyxform/xlsform, but I think that's okay. The client must not consider version"1.0.0" to mean that it has to implement the xforms-value-change event (and refuse to load if it doesn't). It just means that it is possible the form contains it (and it can do whatever it wants with that information - I would detect its actual use in the XForm and show a warning).

tiritea commented 4 years ago

linking to an XForms version is probably a good solution to detect XForms changes that cannot easily be determined from analyzing the XForm content itself

👍

The client must not consider version"1.0.0" to mean that it has to implement the xforms-value-change event (and refuse to load if it doesn't). It just means that it is possible the form contains it (and it can do whatever it wants with that information - I would detect its actual use in the XForm and show a warning).

Agreed. I dont think a client necessarily has to claimodk:xforms-version=1.0.0 conformance per se; rather, it can utilize this additional piece of info to better understand what to expect (and not expect) in the XForm definition.

lognaturel commented 4 years ago

this solution is very pyxform-centric and that is not ideal

I think the problem we're trying to solve is a form builder problem, not an XForms problem because form builders implement features at different rates. Perhaps if we wanted to be more generic we could do odk:builder-user-agent, odk:builder or something like that. Then you might do pyxform-v0.15.1-29-g5542ae4 and build-0.3.2-88-g61f9f08. That does seem better to me.

I think that it so happens that introducing xforms-version today would solve @MartijnR's particular problem but it wouldn't solve it generally. That is, it seems we may need to change pyxform behavior in a way that doesn't involve a corresponding XForms change or that may involve a past XForms change.

I think we'll have a number of similar cases where form rendering clients or servers would like to know about the form builder version. The changes to make the meta block ODK XForms spec compliant are another potential example.

I agree that it could be useful to also capture XForms spec compliance level but that feels different to me. I'm also not really sure what level pyxform would claim since it is missing support for things that have been in the spec forever and includes newly-introduced features.

tiritea commented 4 years ago

But isnt the issue that pyxform is generating non-standard XForm XPath expressions - which happen to be customizations introduced by our ODK XForm spec? If so, then in effect these non-standard, problematic features are ultimately tied to our particular ODK xforms spec, and its associated version. Right?

It just seems odd to me that an (ODK standard-compliant) XForm would have a version tag for a specific generation tool, which itself is should - ideally - advertise conformance to some particular version of said spec (!). So if we're not using the ODK spec version as the basis for the toolchain, then it seems like we may be missing something somewhere else... [sic]

[additional note: spec conformance can have optional components, so its not like pyxform has to do 100% everything doc'd in the spec]

That is, it seems we may need to change pyxform behavior in a way that doesn't involve a corresponding XForms change

I'm confused... if a pyxform change doesn't actually change the resulting XForm/XPath, then who cares?! (note, I'm grouping specific XPath support with a particular XForm version. So changes in XPath support could/should trigger an odk:xforms-version change I would think)

MartijnR commented 4 years ago

I'm starting to think we need both (and I like odk:builder and odk:xforms-version)... (though obviously I hope to avoid as much possible checking for these versions in Enketo). At least the cost of including them is quite small.

I think the problem we're trying to solve is a form builder problem, not an XForms problem because form builders implement features at different rates.

I don't really see the repeat-ref issue as a form-builder problem. For me the initial proposed odk:pyxform-version solution worked by assuming that we solved the problem for enough users due to XLSForm's popularity. We happen to know that pyxform-generated XForms after a certain commit are safe, but it does indeed seem weird to fake this in hand-coded XForms by adding a pyxform version to those. If we're committed to removing the deviation from the XForms Spec in version 1.0.0, it seems to make more sense for forms to advertise that in a general odk:xforms-version:"1.0.0".

I'm also not really sure what level pyxform would claim since it is missing support for things that have been in the spec forever and includes newly-introduced features.

Missing features wouldn't matter for pyxform the way I see it. If pyxform has currently no way to generate <setvalue> for xform-value-change events, the XForms it produces can still comply with the latest spec.

The changes to make the meta block ODK XForms spec compliant are another potential example.

I think the metadata placement issue could be seen from 2 different perspectives: (1) as a significant deviation with the spec or (2) a bug that was found after claiming to produce valid x.x.x ODK XForms output.

In the former case, it would have to be fixed before we can add odk:xforms="1.0.0" to the output. In the latter case, it's too late to do so, and odk:builder becomes the best option. So if that's the only current issue, I'm biased towards taking the latter perspective, though strictly speaking the former would be more correct. (PS. I think the issue with meta tags we discussed was with updating existing XLSForms which would not be resolved with any XForms versioning - but the meta-data change certainly is a good candidate for requiring this if e.g. applications would like to do cross-form analysis of metadata).

I think we'll have a number of similar cases where form rendering clients or servers would like to know about the form builder version.

I also remember we've discussed other candidates, but I cannot think of them at the moment. If you remember, it may be good to add them.

tiritea commented 4 years ago

So the ODK spec says:

However, in this spec, due to an unfortunate persistent historical error, absolute paths inside repeats are always evaluated as if they are relative to the current nodeset. In other words, the absolute XPath /data/path/to/repeat/node when it is referred to from inside a repeat is evaluated as if it is the relative XPath ../node.

So basically, in the current version of the ODK spec (0.9.0?), absolute XPath expressions are/will be interpreted as relative to the current repeat node. Whereas in some future version of the ODK spec (1.0.0?), absolute XPath expression will be interpreted relative to the actual instance root (/).

So if I (or pyxform...) produce an XForm where I have written an absolute XPath expressions with the understanding that it should be interpreted relatively inside a repeat group, then I should tag my XForm as conforming to odk:xform-version=0.9.0. Whereas if I (or pyxform) write an absolute XPath expressions with the understanding that it will always be interpreted as absolute, then I would tag it odk:xform-version=1.0.0.

I guess I'm still not seeing why odk:xform-version doesn't convey exactly what we want to specify here?... A client knows, by looking at odk:xform-version, exactly how it is expected to handle the form's XPath expressions; and either it already does (because that's how the client's XPath processor is written), or it doesn't/cant (in which case it can reject the form as unsupported), or - ideally - it might be able to support both approaches and can use the version tag in order to ~'massage'~ correctly interpret any XPath references it comes across (eg in runtime?) appropriately.

[this leave unspecified what we do with forms lacking any version tag, but we face the same issue regardless... Probably assuming the lack of an explicit odk:xform-version implies a legacy form, ie 0.9.0 functionality...]

lognaturel commented 4 years ago

if we're not using the ODK spec version as the basis for the toolchain, then it seems like we may be missing something somewhere else

I agree that this should be what we aim for. I think it is now. But (and I feel like I'm repeating myself across several conversations, sorry!), we can't erase 10+ years of history. There are numerous bugs and omissions in pyxform that we've redressed recently and probably more to go.

if a pyxform change doesn't actually change the resulting XForm/XPath, then who cares?!

Apologies -- I meant to say that pyxform behavior may need to change without an XForms spec change. Another example of that would be the repeat output (https://github.com/XLSForm/pyxform/pull/380). We've learned that at least one major client (Enketo), was "modifying" form definitions to get a particular behavior. It seems to be the behavior that users want. So we're now modifying the pyxform output. In that case, I think having the form generator information will be helpful for end user support more than anything. In fact, I should have said that earlier, because it's for support that I've wanted form builder version information in the past.

produce an XForm where I have written an absolute XPath expressions with the understanding that it should be interpreted relatively inside a repeat group, then I should tag my XForm as conforming to odk:xform-version=0.9.0. Whereas if I (or pyxform) write an absolute XPath expressions with the understanding that it will always be interpreted as absolute, then I would tag it odk:xform-version=1.0.0.

I see. I was thinking of it strictly as Enketo removing its pre-processing step but I see what you mean about broadly any client perhaps supporting either behavior.

The deviation which came from the initial JavaRosa implementation was not quite correctly characterized. It is only if a single node type is requested. Instead of returning the first node as in standard XPath, it evaluates it in the context of the enclosing repeat. An absolute path in a context where a nodeset is requested works as expected. The problem is that this different behavior based on whether a nodeset or single node is requested is hard/impossible to get with an off the shelf XPath evaluator. So in Enketo (correct me if I'm wrong, @MartijnR) it has never been possible to use absolute XPath expressions that return nodesets in repeats.

pyxform passes through XPath paths that users manually include in their form definitions. We know that users do this and examples floating around show absolute paths in repeats to mean relative ones. I think it's unlikely JavaRosa/Collect would ever change that behavior. That is, unless there's a good reason why someone might want to use an absolute path to mean the first instance of a repeat but that doesn't seem useful to me.

All that to say that I'm personally more interested in knowing what tool generated a certain form. But y'all are right that in the case of the interpretation of absolute paths it's an XForms spec issue.

If we're going to introduce xforms-version, I'd like to make sure we have some kind of process for the actual spec versioning.

Missing features wouldn't matter for pyxform the way I see it. If pyxform has currently no way to generate for xform-value-change events, the XForms it produces can still comply with the latest spec.

I agree with this in principle but I don't see how it works practically. It seems that when new features are added to the ODK XForms spec, that should be captured in some kind of version change, no? That would seem to require that form builders release support for features in the same order to be able to claim a particular spec version. Or perhaps the attribute could be odk:xforms-major-version so only breaking changes are captured since those are probably the only ones clients of the form definition care about?

I also remember we've discussed other candidates, but I cannot think of them at the moment. If you remember, it may be good to add them.

Will ponder.

tiritea commented 4 years ago

All that to say that I'm personally more interested in knowing what tool generated a certain form.

I agree that knowing what specific tool (+ version thereof) generated a form can be useful information, but that does rather defeat the purpose of having a 'specification' [sic]... 😉

But in all seriousness, making explicit what exact tooling was use further upstream in the tool-stack is a bit of a slippery slope, one that can lead (inadvertently?) to a more 'closed' ecosystem. I also understand the concern around introducing an odk:xforms-version - eg what to do when new spec features are introduced, what if certain features aren't supported by a particular client, etc - but I think there can be a lot of flexibility around what conformance actually means. For example, being "conformant to odk:xforms-version=0.9.0" need not necessarily entail that a minimum of features X, Y and Z are implemented, but instead can just mean the context in which to interpret the form; in this case, how to interpret absolute XPath references. In that sense, odk:xform-version can serve the exact same purpose as theodk:pyxform-version

In truth, if we basically own the ODK spec, then we can define the sematics of what its versions mean.

tiritea commented 4 years ago

BTW, I hope I'm not appearing too obtuse [or should that be 'acute'...]. End-of-day I'll almost certainly ignore any odk:pyxform-version tag in my client (although I could forsee maybe wanting to pay attention to odk:xforms-version down the road...). I just dont want to see adding anything to ODK spec unless its absolutely required [I'm a minimalist wrt spec content], especially something that, on the surface, appears very tool-specific.

lognaturel commented 4 years ago

Let me actually walk back my prior enthusiasm for odk:xforms-version for the current Enketo need. Relative references in repeats have worked in clients since always. The real issue is that because JavaRosa interprets absolute references in repeats with the enclosing repeat as context, form builders (automated or manual) got in the habit of just using absolute references for everything.

This became a problem as tools using off-the-shelf XPath evaluators joined the ecosystem because they couldn't get the absolute-as-relative behavior just for single node evaluation. It's more common to have forms with relative references that refer to single nodes than to nodesets. So clients like Enketo sacrificed the latter for the former.

What Enketo needs to know is "can I be confident that relative references in repeats used in a context where a single node is needed are represented as actual relative paths" so that it can correctly interpret absolute paths that refer to nodesets. If it can't have that confidence, it would rather err on the side of interpreting absolute paths in repeats as relative because that's the more likely case. That doesn't have anything to do with ODK XForms spec level, it's purely about the form definition.

can lead (inadvertently?) to a more 'closed' ecosystem

I think what you mean is that if clients of form definitions start having special cases for different form builders, it makes it harder for new/smaller tools to get recognized, right? I see what you mean. To me, this information is largely for troubleshooting and for letting clients of forms deal with significant changes in old form builders that have produced forms for a long time. I think we can specify it as such. I don't think it would be an issue for new tools because those new tools actually would comply to specifications or their bugs would likely be spotted and rectified quickly. The problem we have here is again, pyxform decided at the beginning of time to exploit a behavior that is hard for all form clients to replicate and there are MANY forms that will use this for a long time yet.

End-of-day I'll almost certainly ignore any odk:pyxform-version tag in my client

As will Collect, Central, etc. But I think that's ok -- I expect it will be quite helpful for forum support. I do think the attribute name should be something more general about form source, though (odk:builder-user-agent, odk:builder, odk:definition-source, odk:generated-by).

MartijnR commented 4 years ago

I go back and forth !

That doesn't have anything to do with ODK XForms spec level, it's purely about the form definition.

Strictly speaking the pyxform change a short while back was not spec-related indeed. We would be pretending it was by removing the deviation from the spec at the same time (which is now done) and generate a spec version.

I think both ways of looking at the Enketo requirement prompting this discussion are valid. I don't mind viewing it as pyxform specific, but still find it a little easier to reason from an XForm spec perspective.

And this is still a remaining sticking point: If we choose a builder version only, the answer to @tiritea question about what to do with hand-coded (or ODK Build built) XForms that would like to refer to a node-set of repeated nodes inside a repeat in Enketo, would be to add a fake pyxform builder version (or any fake builder version that Enketo looks for).

lognaturel commented 4 years ago

If we choose a builder version only, the answer to @tiritea question about what to do with hand-coded (or ODK Build built) XForms that would like to refer to a node-set of repeated nodes inside a repeat in Enketo, would be to add a fake pyxform builder version (or any fake builder version that Enketo looks for).

I suppose this could be seen as a hack/abuse as well, but I was thinking more that if a odk:generated-by (or whatever) attribute exists, Enketo could assume that the form generator post-dates the rally around true relative references in repeats. So @tiritea could use odk:generated-by='gareth-brain' and that'd be fine.

It's true that we can only do this once but I would find it really surprising if we found something of this magnitude that affected all form definitions made up to the present. If there is such a thing and we haven't discovered it yet, it seems it would have to be fairly niche.

MartijnR commented 4 years ago

That makes me wonder if the builder attribute value should be a username and Enketo can just look for lognaturel,ln,xiphware,tiritea values to cover 99.9% of the cases. ;)

But, yeah, checking for the presence of this attribute and not the value seems quite a hack, though I guess it would work.

tiritea commented 4 years ago

...but I was thinking more that if a odk:generated-by (or whatever) attribute exists, Enketo could assume that the form generator post-dates the rally around true relative references in repeats. So @tiritea could use odk:generated-by='gareth-brain' and that'd be fine.

Syntactically, how does that really differ from assuming anything with odk:xforms-version >= 1.0.0 ?

tiritea commented 4 years ago

That makes me wonder if the builder attribute value should be a username and Enketo can just look for lognaturel,ln,xiphware,tiritea values to cover 99.9% of the cases

BTW I think you should probably add "martijnr" to your rogues gallery... :-)

lognaturel commented 4 years ago

Syntactically, how does that really differ from assuming anything with odk:xforms-version >= 1.0.0 ?

It doesn't differ syntactically but it feels more true to the problem that we're trying to solve here. More broadly, I'm not convinced that the odk:xforms-version will ever have value again and I'm having a fair amount of trouble understanding what it would even mean and when the version would change. On the other hand, knowing the origin of the form would be very useful across the board.

MartijnR commented 4 years ago

I'm now okay with an odk:generated-by="" compromise and can update #394 to produce the value like"pyxform-v0.15.1-29-g5542ae4". Enketo would consider any value for this attribute to imply that the old absolute paths to refer to nodes inside a repeat are not used.

Any evolved opinions on your side @tiritea?

yanokwa commented 4 years ago

Should this generated-by be documented in the spec?

MartijnR commented 4 years ago

Yes, it should (once fully agreed).

MartijnR commented 4 years ago

The discussion continued on Slack and so far we've 'decided' to use odk:xforms-version="1.0.0" on the <model> node instead. https://app.slack.com/client/T34CUEQEL/DDHP5GD7F/thread/C3K3P0WS2-1580754535.001100