Open matthew-white opened 2 years ago
We've also had the thought that it'd be helpful to check that submission XML is well formed. Validating submission XML would probably be simpler than form XML, because ODK Validate wouldn't be involved. However, there would also probably be fewer benefits.
Here are a few examples of XML errors that Backend currently accepts but that Postgres does not consider well formed:
<select1>
tag is closed by </select>
instead of </select1>
.version=“53”
. Postgres outputs: AttValue: " or ' expected
.readonly="true()"
is specified twice for the same element.
Right now, if a user uploads an XLSForm, pyxform will validate it. One thing that pyxform will do is pass the resulting XForm to ODK Validate. However, if a user uploads an XForm, Central will complete only limited validation. The backend will use
htmlparser2
to parse the XML, buthtmlparser2
is a relatively forgiving parser that will tolerate some errors. The backend won't pass the XForm to ODK Validate. There would be benefits to additional validation of XForms:xml
data type in more cases. However, Postgres seems to be strict about XML validation, so that data type is only available for forms whose XML is well formed. If there was a guarantee that all new forms have well formed XML, it might feel safer to use thexml
data type in more cases.Probably what would be most useful would be to pass all XForms through ODK Validate. However, even just a basic check that the XML is well formed would be useful.
Whatever the new validation is, it should ensure continuity for servers that have already uploaded forms with XML that is not well formed. It should be possible for those servers to continue to use any form that has already been uploaded, to upgrade to new versions of Central, and to restore database backups.