getodk / central

ODK Central is a server that is easy to use, very fast, and stuffed with features that make data collection easier. Contribute and make the world a better place! ✨🗄✨
https://docs.getodk.org/central-intro/
Apache License 2.0
125 stars 151 forks source link

Additional XForm validation #260

Open matthew-white opened 2 years ago

matthew-white commented 2 years ago

Right now, if a user uploads an XLSForm, pyxform will validate it. One thing that pyxform will do is pass the resulting XForm to ODK Validate. However, if a user uploads an XForm, Central will complete only limited validation. The backend will use htmlparser2 to parse the XML, but htmlparser2 is a relatively forgiving parser that will tolerate some errors. The backend won't pass the XForm to ODK Validate. There would be benefits to additional validation of XForms:

Probably what would be most useful would be to pass all XForms through ODK Validate. However, even just a basic check that the XML is well formed would be useful.

Whatever the new validation is, it should ensure continuity for servers that have already uploaded forms with XML that is not well formed. It should be possible for those servers to continue to use any form that has already been uploaded, to upgrade to new versions of Central, and to restore database backups.

matthew-white commented 2 years ago

We've also had the thought that it'd be helpful to check that submission XML is well formed. Validating submission XML would probably be simpler than form XML, because ODK Validate wouldn't be involved. However, there would also probably be fewer benefits.

matthew-white commented 2 years ago

Here are a few examples of XML errors that Backend currently accepts but that Postgres does not consider well formed: