getodk / collect

ODK Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments around the world. Contribute and make the world a better place! ✨📋✨
https://docs.getodk.org/collect-intro
Other
718 stars 1.38k forks source link

Two different forms with the same ID and version are on device #4888

Closed yanokwa closed 2 years ago

yanokwa commented 2 years ago

Software and hardware versions

Collect v2021.2.4, Central v1.3.x

Problem description

An error has occurred You have downloaded two different forms with the same ID and version. You may have loaded them at different times or from different sources. You must delete one.

Steps to reproduce the problem

Unclear

Expected behavior

It should be impossible to have different forms with the same ID.

Other information

Apparently, already deleting and and re-downloaded Collect didn't fix it. Here's what fixed it... Main screen > Project icon > Settings > Project management > Reset > Blank forms.

lognaturel commented 2 years ago

This can happen if using the same Collect project for multiple drafts or a draft and published version. Was the person who experienced this involved with any drafts?

Are you absolutely sure they were using the latest Collect version the whole time? That they hadn’t connected to another server or pushed a form manually?

yanokwa commented 2 years ago

They were not using any server side drafts. They deleted and re-downloaded the app about a week ago, but had only downloaded it for the first time at most two weeks before that. They did not push a form manually.

yanokwa commented 2 years ago

This has happened again with a customer who is on v2021.2.4 (maybe recently upgraded?) and I'm pretty sure hasn't had drafts for months. I know that isn't the best repro, but I wanted to report it anyway.

lognaturel commented 2 years ago

@seadowg @grzesiek2010 I want to make sure you've seen that impossible things are happening. This is starting to feel more serious but I have no idea where to even start troubleshooting. Maybe it's a good opportunity to think about what I've previously mentioned and use the form hash to link instances to form defs? That does feel like a pretty big change, though.

seadowg commented 2 years ago

They did not push a form manually.

I'd missed this update. My assumption would have been that that's how they'd got into this situation. Could we get details on how their form management is set up? Are they using Match Exactly/Previously Downloaded etc?

yanokwa commented 2 years ago

Everyone is on the latest version of Central with managed QR codes.

seadowg commented 2 years ago

Great that at least eliminates a few other async things. I've pretty much forgotten why this should be impossible, so it'd be good to go through again with a fresh mind. I'll see if I can come up with any theories and if not I'll pass it over to @grzesiek2010 (unless he immediately jumps in with exactly why it's happening).

grzesiek2010 commented 2 years ago

@seadowg do you want me to take a look at it?

seadowg commented 2 years ago

@grzesiek2010 that'd be great I haven't had a chance yet.

grzesiek2010 commented 2 years ago

I was trying to reproduce the issue but to no avail. I also investigated the code and I can't see any bugs/mistakes...

Is it possible that it's a bug on Central? I'm asking because: According to the reports from firebase https://console.firebase.google.com/u/2/project/api-project-322300403941/analytics/app/android:org.odk.collect.android/events/~2Foverview%3Ft%3D1636450359111&fpn%3D322300403941&swu%3D1&sgu%3D1&sus%3Dupgraded&params%3D_r..layout.pageNumber%253D0%2526_u..pageSize%253D25&cs%3Dapp.m.events.overview&g%3D1 image

the DownloadSameFormidVersionDifferentHash event started occurring between June 6th-12th. Before that date we have no registered events despite the fact the event was added in v1.30.0-beta.0 (released on February 11th) That period (6th-12th) Central was released: v1.2.1 released on June 9th (the next version of Collect was released on June 18th so at least a week later) What a coincidence... If I'm wrong please correct me. If not I need to figure out what kind of bug that would need to be to mess up with Collect like that.

grzesiek2010 commented 2 years ago

In v2021.3.2 we fixed https://github.com/getodk/collect/pull/4906 handling null or empty form hashes. Thanks to firebase analytics now we know that it was a real case: image

I think these two issues are somehow related because after fixing the one I mentioned above the number of DownloadSameFormidVersionDifferentHash events seems to be much lower (v2021.3.1 vs v2021.3.2): image where now v2021.3.2 is more popular so the stats seem to be accurate: image

This issue still exist so there must be something else causing it but it looks like most of the cases were caused by wrong data received from a server (or maybe all the cases, we still don't know what the cause of the reported events we now see is).

I think we could have made that new event (NullOrEmptyFormHash) provide more info to know if that wrong data come from Central or maybe other servers but I'm pretty sure it is Central (too) taking into account what I described in my previous comment.

lognaturel commented 2 years ago

@yanokwa @matthew-white @ktuite something to take a quick look at. Maybe start by verifying that there are tests to confirm that the form lists always contain form hashes and versions?

matthew-white commented 2 years ago

I took a quick look at the Central backend code, but nothing stands out to me. The form list doesn't have much logic around the form hash and version:

https://github.com/getodk/central-backend/blob/57c6141bf3a04d5ff474afe9fc51d0c5d23b91e0/lib/formats/openrosa.js#L41-L55

version can be an empty string, but I don't see an obvious way for hash to be null or an empty string. I'm not sure that this is helpful, but note that Central always specifies <version> even when def.version is an empty string, and even if def.hash were an empty string, the form list would specify <hash>md5:</hash>.

The Central database enforces that form_defs.hash is NOT NULL. I took a look at the QA server, and form_defs.hash is consistently 32 characters.

Here's one test of the form list:

https://github.com/getodk/central-backend/blob/57c6141bf3a04d5ff474afe9fc51d0c5d23b91e0/test/integration/api/forms.js#L124-L151

That period (6th-12th) Central was released: v1.2.1 released on June 9th (the next version of Collect was released on June 18th so at least a week later)

v1.2.1 mostly involved changes to the frontend. There was one small backend change (getodk/central-backend#379), but nothing that I think would affect this. On the other hand, v1.2.0 involved a major refactor of the backend. It was released on May 18 though, so I'm not sure the dates quite line up.

@lognaturel, let me know if you have ideas about other things to look at in the backend!

ktuite commented 2 years ago

I'm reaching the same (non)conclusion that the version and hash should be there, and it should be impossible for two different forms to share id and version by design (according to constraints in the backend central database).

A possible thought on timing: I wonder if servers that cause this Collect error were blocked by that migration and couldn't upgrade in May until the June patch (which involved reading form xml defs and handling form ids ending in .xml/.xlsx differently)? And that change in the 1.2.1 patch is a clue? Even if it is a clue... i can't figure out what it points to.

What about the fact that xml form id and version can be arbitrary numbers of white spaces? Any way that %20 and %20%20 could be causing a collision?

yanokwa commented 2 years ago

@ktuite whitespaces is an inspired guess! I checked the one server that I know might have this issue and all the form ids are sane (followup_visit) and the versions are all integers, except for the initial versions which are blank.

grzesiek2010 commented 2 years ago

I think we should improve logging that event when form hash is null or empty (and maybe check some other properties like id or version too) and add info about a server (whether it's Central or not). My suspicion was that it's Central but I can't be sure without better analytics.

grzesiek2010 commented 2 years ago

I added a pull request with improved logging https://github.com/getodk/collect/pull/4921

grzesiek2010 commented 2 years ago

I think we can close this issue as https://github.com/getodk/collect/pull/5063 is merged