getodk / briefcase

ODK Briefcase is a Java application for fetching and pushing forms and their contents. It helps make billions of data points from ODK portable. Contribute and make the world a better place! ✨💼✨
https://docs.getodk.org/briefcase-intro
Other
60 stars 156 forks source link

Split select multiple column order is not stable when choice list is randomized #862

Closed chrissyhroberts closed 4 years ago

chrissyhroberts commented 4 years ago

Software versions

Briefcase v1.7.0 - v1.7.3, openjdk version "13.0.2" 2020-01-14 OpenJDK Runtime Environment (build 13.0.2+8) OpenJDK 64-Bit Server VM (build 13.0.2+8, mixed mode, sharing)

OS X 10.15 Aggregate v2.0.3

Problem description

Using the -ssm option with -smart-append is not functional.

Each time -ssm parses a select-multiple type question, the column order of output changes.

I ran three exports

Export 1

rumours.trust_rumours_source.internet rumours.trust_rumours_source.facebook_priv rumours.trust_rumours_source.other rumours.trust_rumours_source.friends rumours.trust_rumours_source.radio rumours.trust_rumours_source.twitter rumours.trust_rumours_source.family rumours.trust_rumours_source.facebook_pub rumours.trust_rumours_source.coworkers rumours.trust_rumours_source.church rumours.trust_rumours_source.tv_news rumours.trust_rumours_source.teachers rumours.trust_rumours_source.whatsapp rumours.trust_rumours_source.school rumours.trust_rumours_source.papers rumours.trust_rumours_source.congregation

Export 2 rumours-trust_rumours_source.twitter rumours-trust_rumours_source.whatsapp rumours-trust_rumours_source.congregation rumours-trust_rumours_source.internet rumours-trust_rumours_source.family rumours-trust_rumours_source.coworkers rumours-trust_rumours_source.church rumours-trust_rumours_source.radio rumours-trust_rumours_source.friends rumours-trust_rumours_source.papers rumours-trust_rumours_source.tv_news rumours-trust_rumours_source.other rumours-trust_rumours_source.teachers rumours-trust_rumours_source.facebook_priv rumours-trust_rumours_source.school rumours-trust_rumours_source.facebook_pub

Export 3 rumours-trust_rumours_source.facebook_pub rumours-trust_rumours_source.other rumours-trust_rumours_source.congregation rumours-trust_rumours_source.family rumours-trust_rumours_source.friends rumours-trust_rumours_source.papers rumours-trust_rumours_source.twitter rumours-trust_rumours_source.whatsapp rumours-trust_rumours_source.teachers rumours-trust_rumours_source.facebook_priv rumours-trust_rumours_source.church rumours-trust_rumours_source.coworkers rumours-trust_rumours_source.tv_news rumours-trust_rumours_source.radio rumours-trust_rumours_source.school rumours-trust_rumours_source.internet

and ended up with garbage data in the incrementally saved CSV. For now the fix is to not use -ssm and -smart-append together.

Steps to reproduce the problem

run -ssm and -smart-append together.

this is possibly triggered by update to form definition with version update. first saw weird behaviours when updated form.

Expected behavior

reshuffling of data columns in -ssm columns, possibly only after form updated.

lognaturel commented 4 years ago

I'm really glad you caught that, @chrissyhroberts. Thanks for the detailed report.

With @ggalmazor no longer focused on ODK we have less capacity on Briefcase but will try to get this resolved ASAP.

dcbriccetti commented 4 years ago

this is possibly triggered by update to form definition with version update.

Hi @chrissyhroberts. I’m trying to reproduce the problem. Is this a necessary step?

lognaturel commented 4 years ago

Thanks for looking into it, @dcbriccetti! I'm guessing not but it's something to verify. I would start with a form with just a single select question with a few choices. I'd make a few submissions, run an export, add a few more submissions, run an export and do that until it repros or up to ~5 times. If it's not reproing then I'd try to update the form definition in some way.

lognaturel commented 4 years ago

@dcbriccetti I'm also in Slack if you want to discuss. Another approach might be to try to write a test for the case without reproducing with a real setup. I have some time this afternoon, so let me know if you want to collaborate.

lognaturel commented 4 years ago

@chrissyhroberts We've tried a number of things to try to reproduce or track down with the information provided but haven't been able to yet. Can you please share the form? You could email it to me if it's sensitive. Alternately you could share a form with just that question, any groups or repeats it's in, and its choices.

We've tried a very simple select multiple, one with the choices you've provided, updating the form version, using a choice filter. We've also audited the code. In our scenarios the choice order has been stable and based on the form definition order of the choices.

I have confirmed that if you change the order of the choices in your form update, you will change the order in the export as well. Could this have happened? Were those three runs back to back with no pull in between?

lognaturel commented 4 years ago

Thanks for sending the form definition, @chrissyhroberts. I haven't reproduced but from looking at the definition, I think it's almost certainly because of choice order randomization. When Briefcase loads the form definition, it loads it the same way that a form filling client loads it and the choice order is randomized. The export order is based on the form definition and that order is no longer stable.

To support this, we'd need to either go back to the original form definition XML or intercept the choice order before JavaRosa randomizes it.

lognaturel commented 4 years ago

I have reproduced using this simple form with randomized select choices. I sent in a few submissions where I selected only 'a', did an export. Then I made another submission selecting only 'a', did an export with -ssm and -sa and the choice was marked in a different column.

java -jar /Users/ln/Downloads/ODK-Briefcase-v1.17.3.jar -U https://sandbox.aggregate.getodk.org -plla -id briefcase-ssm-order -sd /Users/ln/Documents/projects/odk

java -jar /Users/ln/Downloads/ODK-Briefcase-v1.17.3.jar -e -ed /Users/ln/Downloads -f rand-order.csv -id briefcase-ssm-order -sd /Users/ln/Documents/projects/odk -ssm -sa

yanokwa commented 4 years ago

@chrissyhroberts The fix is in the just released Briefcase v1.17.4