getodk / collect

ODK Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments around the world. Contribute and make the world a better place! ✨📋✨
https://docs.getodk.org/collect-intro
Other
718 stars 1.38k forks source link

Error saving recovery point: Attempt to get length of null array #213

Closed yanokwa closed 7 years ago

yanokwa commented 8 years ago

"aWe are using ODK to run a survey on Android tablets. The form has been working well and we've had a number of people complete the survey. Today, a new error message popped up that meant we couldn't save the form. It read "Error saving recovery point: Attempt to get length of null array" (photo attached). Could anyone please help advise the source of this error and how to resolve?"

From foodbanksurvey2016@gmail.com

grzesiek2010 commented 7 years ago

Probably it is caused by unsupported chars in xml. I'm able to reproduce this issue using this barcode: encrypted example (scann the barcode and try to go to the next question) it's also possible to put such wrong chars in StringWidget using copy/paste.

Please see http://www.w3.org/TR/REC-xml/#charsets. Available characters are

x9 | #xA | #xD | x20-#xD7FF | xE000-#xFFFD | x10000-#x10FFFF

The result contains non printable characters like: NULL, SOH, ENQ, STX, LF

I sent an email to foodbanksurvey2016@gmail.com so we have to wait for confimation.

nap2000 commented 7 years ago

Some clients of mine have encountered this issue with pdf417 barcodes. I made a modification to my fork of odkCollect to remove non printable characters from the barcode string. This fixes the issue of the form not being savable.

The above fix has satisfied my client for, for the moment anyway. It doesn't address the issue of how an application might make use of this data.

I also haven't looked at applying this to the copy / paste of data into a string widget but can do.

Is the above fix acceptable? If so I can initiate a pull request.

grzesiek2010 commented 7 years ago

@nap2000 you are right but I think a better solution is changing such non-printable chars, not just removing them. It was ok for your client but we don't know all other cases. Maybe it's an ID and must be unique. Maybe someone wants to handle such string and then change the sting back (to original) on the server side or something else.

So maybe we should find unsupported chars and change each to something other. For example using U+HEX: &#16 will become "U+0010" &#5 will become "U+005" etc. Then we have to inform our users about the situation. Any other ideas?

Anyway let's wait a few days more because I'm still waiting for a feedback from foodbanksurvey2016@gmail.com. I'm not sure if all this is worth the effort @nap2000 maybe your solution is enough.

nap2000 commented 7 years ago

@grzesiek2010 Sounds promising.

The data I came across was fingerprint information from a drivers license. It will be interesting to see if we will be able to extract this data, or similar, uncorrupted from the string that comes back from the barcode reader. I've got no plans to look at this further at the moment. I guess if foodbank get back to you with such a requirement then you will want to address it?

Yes you may want to consider the quick and dirty solution as an interim approach. This is because a) Currently I think if you scan a barcode with non printable characters the form cannot be saved and all the data will be lost b) Organisations may just want to look at the text parts of the barcode anyway

grzesiek2010 commented 7 years ago

@nap2000 I think I can fix this but time will tell maybe I will work on something else (more important). Now we have to wait for worth opinions from others.

mitchellsundt commented 7 years ago

Seems like the designer of the form should specify a field attribute to indicate whether the scanned code should be stored in a Base64 encoding and if not, then you could strip out the non-printable characters so that the form content doesn't get trashed - perhaps replacing the removed characters with '.'?

I doubt the ZXing barcode reader returns additional arguments in its result bundle that would robustly indicate the encoding of the returned value (e.g., whether it is binary or not). And if it did, we would be increasingly restricted to just that barcode scanner.

This places the burden on the data collector to know whether all scanned content will be textual or whether there can be binary content.

On Fri, Jan 20, 2017 at 6:04 AM, Grzegorz Orczykowski < notifications@github.com> wrote:

@nap2000 https://github.com/nap2000 I think I can fix this but time will tell maybe I will work on something else (more important). Now we have to wait for worth opinions from others.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/213#issuecomment-274079324, or mute the thread https://github.com/notifications/unsubscribe-auth/ACLO01__2gkOYZN7qpq5WbvAfpRAKywdks5rUL8AgaJpZM4Ke2r8 .

-- Mitch Sundt Software Engineer University of Washington mitchellsundt@gmail.com

nap2000 commented 7 years ago

I think this is an excellent solution.

So support an appearance of "binary" for barcodes?

There are barcodes that have a combination of both text and binary components. Potentially there could be a future need to extract text from part of such a barcode for use in a relevance or just to return a calculated value to the server.

nap2000 commented 7 years ago

Unless anyone wants to suggest a different approach or make the change themselves I'll submit a pull request for a solution along the lines described by Mitch sometime before the weekend.

lognaturel commented 7 years ago

That sounds good to me, @nap2000. Though I'm a little bit confused -- I thought adding an appearance would require a change to the XForms spec, JavaRosa and the XLSForm spec but it looks like they are passed all the way through to the client. Is this because they are purely considered optional client hints? It seems there should still be some coordination across the ecosystem when adding things at the form level but I'm not sure where to document appearances. I think the only place they currently show up is the XLSForm spec. @MartijnR, I'm always interested in your ecosystem guru thoughts. 😊

MartijnR commented 7 years ago

@lognaturel Yes, appearances can be very easily abused ;), and this has created an enormous mess in terms of interoperability (Enketo is definitely one of the guilty parties). Coordinating this would be good.

I wonder if this would be best done with a new custom datatype instead of an appearance (or instead of a custom attribute). The "barcode" datatype does some replacement and the new one would base64-encode the result. Alternatively, maybe there is a link with this discussion.

nap2000 commented 7 years ago

Hmm yes appearance abuse, after search() and pulldata() there is probably not a lot more you can do on that front.

I think I would prefer to use appearance for this case and probably also cases where we are setting the size of a photo. Ie we should be able to have a question "type" and "parameters" for that type. Appearance seems to me to be useful as a misnamed parameter setting rather than specifically about the look of the widget.

Justifications could be that appearance changes will have less impact on editors than a new type. There is also I think a risk of creating N factorial new types when N appearance settings would have done. Would new types help with interoperability or is it just that as the feature set grows not all implementations keep up?

Bear in mind the issue that caused this discussion, could be fixed merely be automatically removing non printing characters from the barcode. No one has come forward to identify a use case for processing the digital part of the barcode. So it may make sense to split the problem, fix the known issue and leave decisions on actually handling binary to another day.

On Thu, Jan 26, 2017 at 9:49 PM, Martijn van de Rijdt < notifications@github.com> wrote:

@lognaturel https://github.com/lognaturel Yes, appearances can be very easily abused ;), and this has created an enormous mess in terms of interoperability (Enketo is definitely one of the guilty parties). Coordinating this would be good.

I wonder if this would be best done with a new custom data type instead of an appearance or custom attribute. The "barcode" datatype does some replacement and the new one base64-encodes the result. Alternatively, maybe there is a link with this discussion https://github.com/opendatakit/xforms-spec/issues/79.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/213#issuecomment-275475602, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2y7Qb0eFJvxNRYjHiECswmnyXj9iQIks5rWOqdgaJpZM4Ke2r8 .

--

Smap Consulting http://smap.com.au/| Mobile Data Collection Solutions Application Developer - neilpenman@gmail.com minqiang.huang@gmail.com Twitter: @dgmsot Skype: ianaf4you Phone: +61 402 975 959 Website: http://smap.com.au http://smap.com.au/blog

MartijnR commented 7 years ago

Sounds indeed that there is some scope creep perhaps. :) Why use an appearance at all as part of the fix? Can't we just make this the behaviour of type="barcode" (always)?

Best to move spec discussions to the spec repo but I strongly disagree with abusing appearance for non-UI purposes and with it being 'misnamed'. The attribute is on form-control elements for a reason. Wrt to impact of doing things properly syntax-wise: this is why I think it's important. If your only concern is ODK Collect there is no reason to worry about these things. [edited]

lognaturel commented 7 years ago

Thank you for that context @MartijnR and @nap2000. It seems like it would be really nice if, moving forward, appearance were only used for things that provide display hints for the client.

@nap2000 good point about keeping the focus on the issue at hand. Maybe we can start by handling the case where barcode and text fields as they currently exist end up with binary data. It's important that this doesn't block the form from being saved and this is true whether or not there is eventually a "binary barcode" type or equivalent. Either stripping the offending characters or replacing them with '.' seems fine to me. I agree it's not worth spending too much time on until we know it's needed. As @grzesiek2010 suggested, we should let the user know that this has happened so that they can make a feature request if they need the binary data. I added opendatakit/xforms-spec#80 so we can keep track of the idea but it shouldn't be a priority.

nap2000 commented 7 years ago

Cool. I have submitted a pull request addressing, hopefully, just issue 213.