Closed matthew-white closed 3 years ago
the issue w insert on conflict is that you have to ship all the bits whether it ends up creating the data or no. and the way openrosa works i feel like duplicate data is maybe likely. adding for update
didn't do anything though :/
We’re looking at a likely fix for the next Collect release. Since Collect always renames media files we’ll check file hashes and use the actual same file with the same name in multiple places if the user selects the same contents.
However, this limitation is definitely not part of the spec and Enketo or other clients are likely to send the same file with different file names. Maybe the file name can be considered in the constraint?
duplicate data is maybe likely.
I don’t really understand why the constraint is there in the first place, really. Would be good to talk through what this means.
no it’s just a bug
On Nov 16, 2020, at 20:39, Hélène Martin notifications@github.com wrote:
We’re looking at a likely fix for the next Collect release. Since Collect always renames media files we’ll check file hashes and use the actual same file in multiple places for that case.
However, this limitation is definitely not part of the spec and Enketo or other clients are likely to send the same file with different file names. Maybe the file name can be considered in the constraint?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
This does not appear to be fixed in v1.1.1. I have reproduced with https://test.central.getodk.org/#/projects/269/forms/multiple_background/draft/testing and form instance Multiple background recordings_2021-02-08_12-57-27.zip
I think what I'm getting is a slightly different issue which @matthew-white mentioned in his original description. I'll open a separate issue instead.
To reproduce:
I am able to modify an existing test to generate a failing case:
Note that the issue seems to stem from files with identical content, not files with identical filenames (for which I think there is already a test).
I have a theory that the issue is related to
blobs.ensure()
:https://github.com/getodk/central-backend/blob/e9ffd2c0c3aa1a9475852e1397b8259e2b03165a/lib/model/query/blobs.js#L13-L18
blobs.ensure()
checks whether a blob exists, then inserts it if it does not (SELECT
followed byINSERT
). However, when a submission is created, its attachments are inserted into the database in parallel. I think that results in a race condition whereby the checks may happen for two attachments with identical content before either blob is inserted (instead ofSELECT, INSERT, SELECT, INSERT
, the order isSELECT, SELECT, INSERT, INSERT
). When I add logging toblobs.ensure()
, I think I see that behavior when I run the test above.Just an idea, maybe it'd be possible to solve this using an
INSERT
ON CONFLICT
clause?I think this issue is part of what was happening in this forum topic:
https://forum.getodk.org/t/in-odk-central-0-8-some-submissions-the-attachmentspresent-is-different-to-attachmentsexpected-how-avoid-this/26359
It also came up in this forum topic:
https://forum.getodk.org/t/testing-form-on-central-upload-error-a-resource-already-exists/30947