getodk / validate

ODK Validate is a Java application for confirming that a form is valid and compliant with the ODK XForms specification. Contribute and make the world a better place! ✨🔍✨
https://docs.getodk.org/validate/
Other
9 stars 26 forks source link

Support non standard itemset value and label refs and external instances #68

Closed ggalmazor closed 5 years ago

ggalmazor commented 5 years ago

Closes #65

This PR goes one step further towards generic support for external instance by creating a synthetic external instance that would have all the refs required by the form, thus passing validation.

What has been done to verify that this works as intended?

Added unit tests that verify this claim.

Why is this the best possible solution? Were any other approaches considered?

This could be considered hacky because it involves using regular expressions to configure the StubReferenceManager.

I considered parsing the form with KXML2 and then searching for those ref names in the <body> element. This would be less brittle than using regexps, but it felt like we would only get marginal benefits in exchange of making the search code much more complicated.

Are there any risks to merging this code? If so, what are they?

Although the regexps have safety whitespace placeholders to account for weird formattings, it won't detect ref names correctly when the value and label elements use more than one line:

<value
   ref="some-ref"/>

Odds that this happens feel low, though.

The regexps could be missing other unidentified common use cases.

lognaturel commented 5 years ago

Ooh, I totally missed this before responding to #66. Looks great to me (tests!). My only question would be whether this slows down the validation process significantly? I guess regex are pretty fast and this is an offline process anyway so it doesn't particularly matter.

My sense is that we'll still want to do something at that JavaRosa level to address various clients' needs as described in https://github.com/opendatakit/javarosa/issues/444.

This generalization doesn't change anything for pyxform users (as opposed to #66) because the itemset structure is always the same there. It does mean that users who write XML by hand could validate those forms. We don't know if any of those users exist.

Certainly the cleanup alone is worthwhile.

ggalmazor commented 5 years ago

Closing in favour of #69

batkinson commented 5 years ago

It does mean that users who write XML by hand could validate those forms. We don't know if any of those users exist.

Sorry for commenting after closing. I just wanted to respond to the quoted comment, not challenge any technical piece here.

I caught this comment from @lognaturel when scanning the log and these users do exist. In fact, when changes to xlsform offline/online disabled pretty-printing it caused some important workflows to break for my clients. They were hand-editing large converted xlsform forms to make form maintenance from year to year easier while retaining certain advanced form capabilities.

The one I recall immediately was to use dynamic itemsets populated from a repeat question. This is useful when building a list of items/individuals during an interview and later wanting to select from that list elsewhere in the form.

I don't know if this is now possible in xlsform with recent changes, but if so, it is was not obvious from the docs last time I checked.

lognaturel commented 5 years ago

disabled pretty-printing it caused some important workflows to break for my clients

Hopefully they were able to add an auto-indent step? The driver for that change was the big difference in file size.

The one I recall immediately was to use dynamic itemsets populated from a repeat question.

Absolutely, I do this myself and hope https://github.com/XLSForm/pyxform/issues/38 can be addressed sooner than later to expose that power via XLSForm (no, it's still not possible).

Do those users tend to use Validate as part of their workflow? I generally go straight to Collect. In fact, I only ever use Validate as part of pyxform so I'm speculating that most folks do as well. It would be very helpful to know if that's not the case and Validate is actually important in that context.

lognaturel commented 5 years ago

And thanks for chiming in. It's really helpful to get some of those assumptions pushed on.

batkinson commented 5 years ago

Hopefully they were able to add an auto-indent step? The driver for that change was the big difference in file size.

They had me to back them up. They just reported the issue to me and I pretty-printed it for them. They were savvy enough to edit the xml file per the instructions in the SOP for the survey, but not enough to figure out why the xml output from xlsform put everything on one line all of a sudden.

Do those users tend to use Validate as part of their workflow?

It depends on the user. The power users so far tend to eliminate all errors and warnings from validate before trying it in collect (based on experience that many of those tend to become necessary to understand later when the form doesn't work as expected in Collect). Some do go straight to Collect, but normally those users are less inclined to wrestle with the tooling at all - they usually just punt and tell me "it's not working".

After the changes in xlsform caught us by surprise (I was busy and wasn't watching the changelogs), I just developed simple tool-centric web services for validation and conversion and use those to eliminate the conversion step entirely. If they need to edit, they can manually convert in which case it pretty-prints by default (a safe assumption, since it's only useful to manually convert in this paradigm when you need to edit it by hand).

lognaturel commented 5 years ago

Thanks for that background, @batkinson, super helpful.

I think that for this specific issue we're still ok to start with a narrow fix. Users leveraging external secondary instances are writing XML docs and so more likely to be savvy. We do intend on having a more general JavaRosa fix sooner than later.