Open ktuite opened 1 month ago
Here's what I've learned so far:
We have these query module functions
Forms.clearUnneededDrafts
(by form id or project id)Submissions.clearDraftSubmissions(formId)
Submissions.clearDraftSubmissionsForProject(projectId)
These are used in these scenarios:
Setting managed encryption on a project
Abandon/delete a form draft
Publish a form
Update a form draft
Database things
forms
deletion cascades to form_defs
, submissions
and other thingssubmissions
deletion cascades to submission_defs
form_defs
deletion also cascades to submission_defs
form_def
, you could get rid of the descendent submission_defs
but still have the top level submission
The code was originally set up to NOT delete these things, so we could NOT call clearDraftSubmissions
and clearUnneededDrafts
in these most problematic cases. But there would be no path to clearing up this stale data later.
How would we go about cleaning up this stale data later? The form def / draft submission hierarchy isn't set up to make this too easy.
idea?: We could possibly soft-delete the draft submissions themselves for a given form (and then the submission purge task would come and clean them up in thirty days) and we could purge any draft form_defs that 1) aren't linked to a form as its current draft and 2) don't have any remaining submissions? Instead of calling clearDraftSubmissions(formId)
in certain places, we could call softDeleteDraftSubmissions(formId)
? and remove the call to clearUnneededDrafts(formId)
.
A possible benefit (i think) is if you did this, you could poke at the database (before the subs got purged) to set an old draft def to be the active draft def and undelete the submissions and see them again??
idea?: We could possibly soft-delete the draft submissions themselves for a given form (and then the submission purge task would come and clean them up in thirty days) and we could purge any draft form_defs that 1) aren't linked to a form as its current draft and 2) don't have any remaining submissions?
We discussed this idea on a call. It makes a lot of sense to me.
Setting managed encryption on a project
- Purges all draft submissions in that project, purges all unattached draft form defs in the project
- I think this behavior doesn't need to change? If you're encrypting your project and your draft form and submissions get removed in the process, do you really need to be able to recover them? I.e. has anyone run into this flavor of the issue?
Abandon/delete a form draft
- Purges unattached draft form defs for that form and clears draft submissions for that form
- This seems kind of like the desired behavior, too? If you intentionally delete your draft, you want the submission to go away, too.
- Although, it could be ok to switch this to a soft delete?
As far as I know in these two cases, there's no technical reason why we need to purge the submissions immediately rather than soft-delete them. Instead, maybe things would be simpler as a whole if we discarded draft submissions in the same way in every case (i.e., via soft deletion). For example, if we continued to call Submissions.clearDraftSubmissions()
in these two cases, that method would need to be modified to not delete draft submissions that are soft-deleted. (It would immediately purge a submission only if the submission isn't already soft-deleted.)
That said, if it's simpler to handle these cases separately, I think that would also work. @lognaturel, let us know if you've heard about users losing submissions in either of these two cases.
Publish a form
- This can leave an orphan draft def behind if the version changed...
Oh interesting. So there can be orphaned form defs even today? It'd definitely be nice to clean those up. One thing I like about your idea above is that each scenario doesn't need to have its own logic for deleting orphaned form defs. Instead, orphan form defs will get purged on a regular basis by the centralized purge mechanism.
If you deleted the
form_def
, you could get rid of the descendentsubmission_defs
but still have the top levelsubmission
It looks like we previously encountered the case of a submission without a submission def as the root cause behind getodk/central-backend#911. It makes sense to me that your idea above will purge submissions first (including the logical submission) and only then go on to purge orphaned form defs that no longer have submissions.
The code was originally set up to NOT delete these things, so we could NOT call
clearDraftSubmissions
andclearUnneededDrafts
in these most problematic cases.
I think that's true at least of orphaned form defs: we used to allow orphaned form defs to persist in some (all?) cases. It sounds like you've identified a case even today where a form def can become orphaned. Given that, I bet things will continue working properly if we stop immediately purging orphaned form defs and allow them to persist for 30 days.
I'm less sure that there's ever been a time when we didn't immediately purge draft submissions (except by accident in #911). However, I also don't think there are many queries that have to do with draft submissions exclusively and not also non-draft submissions. Any query that can retrieve non-draft submissions should already know how to handle soft-deleted submissions. It might not be a bad idea to check queries that reference submissions.draft
to make sure that they filter on submissions."deletedAt"
as they should.
If a draft form gets shared with data collectors and collects real submissions, it can lead to a number of problems:
We are redesigning the draft form page and the draft QR code to help people avoid this situation, but is there anything else we can do on the backend to give these submissions a chance to be recovered?
Some ideas:
This issue is about investigating these ideas to see if there is a quick way to use existing deletion infrastructure to keep deleted draft submissions around for a little while.