Closed yanokwa closed 2 years ago
This can happen if using the same Collect project for multiple drafts or a draft and published version. Was the person who experienced this involved with any drafts?
Are you absolutely sure they were using the latest Collect version the whole time? That they hadn’t connected to another server or pushed a form manually?
They were not using any server side drafts. They deleted and re-downloaded the app about a week ago, but had only downloaded it for the first time at most two weeks before that. They did not push a form manually.
This has happened again with a customer who is on v2021.2.4 (maybe recently upgraded?) and I'm pretty sure hasn't had drafts for months. I know that isn't the best repro, but I wanted to report it anyway.
@seadowg @grzesiek2010 I want to make sure you've seen that impossible things are happening. This is starting to feel more serious but I have no idea where to even start troubleshooting. Maybe it's a good opportunity to think about what I've previously mentioned and use the form hash to link instances to form defs? That does feel like a pretty big change, though.
They did not push a form manually.
I'd missed this update. My assumption would have been that that's how they'd got into this situation. Could we get details on how their form management is set up? Are they using Match Exactly/Previously Downloaded etc?
Everyone is on the latest version of Central with managed QR codes.
Great that at least eliminates a few other async things. I've pretty much forgotten why this should be impossible, so it'd be good to go through again with a fresh mind. I'll see if I can come up with any theories and if not I'll pass it over to @grzesiek2010 (unless he immediately jumps in with exactly why it's happening).
@seadowg do you want me to take a look at it?
@grzesiek2010 that'd be great I haven't had a chance yet.
I was trying to reproduce the issue but to no avail. I also investigated the code and I can't see any bugs/mistakes...
Is it possible that it's a bug on Central? I'm asking because: According to the reports from firebase https://console.firebase.google.com/u/2/project/api-project-322300403941/analytics/app/android:org.odk.collect.android/events/~2Foverview%3Ft%3D1636450359111&fpn%3D322300403941&swu%3D1&sgu%3D1&sus%3Dupgraded¶ms%3D_r..layout.pageNumber%253D0%2526_u..pageSize%253D25&cs%3Dapp.m.events.overview&g%3D1
the DownloadSameFormidVersionDifferentHash
event started occurring between June 6th-12th. Before that date we have no registered events despite the fact the event was added in v1.30.0-beta.0 (released on February 11th)
That period (6th-12th) Central was released: v1.2.1 released on June 9th (the next version of Collect was released on June 18th so at least a week later)
What a coincidence...
If I'm wrong please correct me. If not I need to figure out what kind of bug that would need to be to mess up with Collect like that.
In v2021.3.2 we fixed https://github.com/getodk/collect/pull/4906 handling null or empty form hashes. Thanks to firebase analytics now we know that it was a real case:
I think these two issues are somehow related because after fixing the one I mentioned above the number of DownloadSameFormidVersionDifferentHash
events seems to be much lower (v2021.3.1 vs v2021.3.2):
where now v2021.3.2 is more popular so the stats seem to be accurate:
This issue still exist so there must be something else causing it but it looks like most of the cases were caused by wrong data received from a server (or maybe all the cases, we still don't know what the cause of the reported events we now see is).
I think we could have made that new event (NullOrEmptyFormHash
) provide more info to know if that wrong data come from Central or maybe other servers but I'm pretty sure it is Central (too) taking into account what I described in my previous comment.
@yanokwa @matthew-white @ktuite something to take a quick look at. Maybe start by verifying that there are tests to confirm that the form lists always contain form hashes and versions?
I took a quick look at the Central backend code, but nothing stands out to me. The form list doesn't have much logic around the form hash and version:
version
can be an empty string, but I don't see an obvious way for hash
to be null
or an empty string. I'm not sure that this is helpful, but note that Central always specifies <version>
even when def.version
is an empty string, and even if def.hash
were an empty string, the form list would specify <hash>md5:</hash>
.
The Central database enforces that form_defs.hash
is NOT NULL
. I took a look at the QA server, and form_defs.hash
is consistently 32 characters.
Here's one test of the form list:
That period (6th-12th) Central was released: v1.2.1 released on June 9th (the next version of Collect was released on June 18th so at least a week later)
v1.2.1 mostly involved changes to the frontend. There was one small backend change (getodk/central-backend#379), but nothing that I think would affect this. On the other hand, v1.2.0 involved a major refactor of the backend. It was released on May 18 though, so I'm not sure the dates quite line up.
@lognaturel, let me know if you have ideas about other things to look at in the backend!
I'm reaching the same (non)conclusion that the version and hash should be there, and it should be impossible for two different forms to share id and version by design (according to constraints in the backend central database).
A possible thought on timing: I wonder if servers that cause this Collect error were blocked by that migration and couldn't upgrade in May until the June patch (which involved reading form xml defs and handling form ids ending in .xml/.xlsx differently)? And that change in the 1.2.1 patch is a clue? Even if it is a clue... i can't figure out what it points to.
What about the fact that xml form id and version can be arbitrary numbers of white spaces? Any way that %20
and %20%20
could be causing a collision?
@ktuite whitespaces is an inspired guess! I checked the one server that I know might have this issue and all the form ids are sane (followup_visit
) and the versions are all integers, except for the initial versions which are blank.
I think we should improve logging that event when form hash is null or empty (and maybe check some other properties like id or version too) and add info about a server (whether it's Central or not). My suspicion was that it's Central but I can't be sure without better analytics.
I added a pull request with improved logging https://github.com/getodk/collect/pull/4921
I think we can close this issue as https://github.com/getodk/collect/pull/5063 is merged
Software and hardware versions
Collect v2021.2.4, Central v1.3.x
Problem description
Steps to reproduce the problem
Unclear
Expected behavior
It should be impossible to have different forms with the same ID.
Other information
Apparently, already deleting and and re-downloaded Collect didn't fix it. Here's what fixed it... Main screen > Project icon > Settings > Project management > Reset > Blank forms.