getodk / build

ODK Build is a drag-and-drop form designer for ODK XForms. Thousands of users around the world depend on it for their data collection campaigns. Contribute and make the world a better place! ✨📝✨
https://build.getodk.org/
Other
110 stars 82 forks source link

Filename of XLSForm export should be XML compliant #44

Closed yanokwa closed 2 years ago

yanokwa commented 7 years ago

When pyxform converts XLS to XML, it uses the XLS file name as the root node name. And because XML has special character requirements (e.g., names must begin with a letter, colon, or underscore, subsequent characters can include numbers, dashes, and periods), we have to be careful about the name of the export.

One way to make this name compliant is to use the form ID in the export, but drop "build_" and the timestamp to make it more human-friendly. So for example, if your form id is build_Favorite-Color_1480937828, the exported file name should be Favorite-Color. edit by @issa-tseng; this section is confusing as it refers to things Build does not actually do

With this strategy, it might still be possible to have an invalid root node name, so the export should also handle names that start with invalid characters.

issa-tseng commented 7 years ago

I think exactly those restrictions are why I generate the name as described:

I prefix with build_ so that the leading character cannot be a number. I postfix with the timestamp because aggregate chokes if multiple forms have the same ID and people were having problems with that.

If you'd prefer risking the user's own input I'm happy to go back to that behaviour.

yanokwa commented 7 years ago

I think my reference to the form ID confused things!

If you have a form name 1 Form and you export it, you get 1-Form-export. This XLSForm, when converted with pyxform will fail because 1-Form-export is used as the name of an XML node and nodes cannot start with a number.

Is it possible to add validation to form names?

issa-tseng commented 7 years ago

I'm just going to prefix a _ if I see a leading number. Cool @yanokwa ?

yanokwa commented 7 years ago

Hmm. Why not use validation on the names?

issa-tseng commented 7 years ago
  1. Completely different part of the code.
  2. A 100% valid Build=>XML=>Aggregate=>Collect filename would be rejected simply because Build=>XLSForm could theoretically fail.
  3. Already-saved forms will load up and immediately flag red.
  4. It's a really confusing limitation to explain; see 2.
yanokwa commented 7 years ago

@lognaturel had an alternate idea. Given that the form is valid, maybe we should file this as an issue against pyxform and it should handle numbers more gracefully.

issa-tseng commented 7 years ago

I'm okay with either! This is a pretty simple change on this side, though. Just catch XLSForm files on their way out the door and subst the attachment meta name.

lognaturel commented 7 years ago

I'm happy with leading numbers being prepended by _.

yanokwa commented 7 years ago

I've filed this upstream: https://github.com/XLSForm/pyxform/issues/130. I propose we hold off on a fix until the pyxform team has a chance to respond.

issa-tseng commented 7 years ago

This is still a trivial one-line change on our side. I'm happy to do it or to do nothing; please advise! :)

yanokwa commented 7 years ago

I say do nothing for now. I'd rather fix this upstream.

issa-tseng commented 7 years ago

Okay, removing from the milestone.

florianm commented 2 years ago

Just to document current behaviour, a form "Test040" will be exported by build2xlsform v1.6 as "Test040-export.

Leading with the (sanitised) form title seems OK now.

I could even live (better) without the "-export" postfix because 1. the file extension clarifies the form standard and 2. including the exported form in R packages throws an error and I have to manually remove the -export from each file. The latter is an edge case that probably not many users suffer from.

Update: PR #266 changes this to "Test040.xslx". Update 2: PR #271 could include #267 which might resolve this issue here.

@yanokwa is this issue addressed?

florianm commented 2 years ago

To document current behaviour, a form named "290test4" exports to "290test4.xslx" (no errors) with an internal form title and form id of "290test4". Uploading that to ODK Central (1.3) throws no errors. Converting the XLSForm back to XLS also seems to work, no errors are raised:

<?xml version="1.0"?>
<h:html xmlns="http://www.w3.org/2002/xforms" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:jr="http://openrosa.org/javarosa" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <h:head>
        <h:title>290test4</h:title>
        <model odk:xforms-version="1.0.0">
  [...]
            </itext>
            <instance>
                <data id="290test4" version="1643329150">
...

Does that mean that form titles beginning with numbers are now handled well on the XLSForm side, and this issue here can be closed?

lognaturel commented 2 years ago

Ah yes indeed! The file name used to be used as the node name for the child of the instance. Now it's always data.