getodk / central

ODK Central is a server that is easy to use, very fast, and stuffed with features that make data collection easier. Contribute and make the world a better place! ✨🗄✨
https://docs.getodk.org/central-intro/
Apache License 2.0
125 stars 156 forks source link

Forms with unicode file names fail upload #196

Closed yanokwa closed 3 years ago

yanokwa commented 3 years ago

Test file: tést.xlsx.zip

Central reports Something went wrong: there was no request.

Here's what I know so far...

My gut says this is some interaction with Central and pyxform-http.

matthew-white commented 3 years ago

When I look at the browser console, I see the error message

Failed to execute 'setRequestHeader' on 'XMLHttpRequest': String contains non ISO-8859-1 code point.

Reading about it, I think the issue has to do with the X-XlsForm-FormId-Fallback header containing a non-ASCII character.

Is a form ID allowed to contain a non-ASCII character? If so, then I think we should encode the filename before specifying it in the header. Probably percent-encoding would be easiest? Frontend would need to encode the header, and pyxform-http would need to decode it, but I'm not sure that Backend would need to change.

We generally encode form IDs throughout Central, and Enketo links in Central use tokens separate from the form ID, so hopefully this would be a one-off change.

lognaturel commented 3 years ago

Is a form ID allowed to contain a non-ASCII character

Yes, it is.

Your analysis seems right to me and percent-encoding between frontend and pyxform-http seems like the right approach.

matthew-white commented 3 years ago

Sounds good! @yanokwa, are you able to add percent-decoding to pyxform-http? If so, I'll plan to add percent-encoding to Frontend.

matthew-white commented 3 years ago

As an update, as part of v1.2, we will have a new header, X-Action-Notes, that will need to be percent-encoded. Given that, I do think it makes for Frontend to percent-encode X-XlsForm-FormId-Fallback, as long as it's not hard for pyxform-http to decode it.

lognaturel commented 2 years ago

Related: https://github.com/getodk/collect/issues/4554