getodk / central

ODK Central is a server that is easy to use, very fast, and stuffed with features that make data collection easier. Contribute and make the world a better place! ✨🗄✨
https://docs.getodk.org/central-intro/
Apache License 2.0
126 stars 155 forks source link

Explore options for recovering deleted draft submissions #749

Open ktuite opened 1 month ago

ktuite commented 1 month ago

If a draft form gets shared with data collectors and collects real submissions, it can lead to a number of problems:

We are redesigning the draft form page and the draft QR code to help people avoid this situation, but is there anything else we can do on the backend to give these submissions a chance to be recovered?

Some ideas:

This issue is about investigating these ideas to see if there is a quick way to use existing deletion infrastructure to keep deleted draft submissions around for a little while.

ktuite commented 2 weeks ago

Here's what I've learned so far:

We have these query module functions

These are used in these scenarios:

  1. Setting managed encryption on a project

    • Purges all draft submissions in that project, purges all unattached draft form defs in the project
    • I think this behavior doesn't need to change? If you're encrypting your project and your draft form and submissions get removed in the process, do you really need to be able to recover them? I.e. has anyone run into this flavor of the issue?
  2. Abandon/delete a form draft

    • Purges unattached draft form defs for that form and clears draft submissions for that form
    • This seems kind of like the desired behavior, too? If you intentionally delete your draft, you want the submission to go away, too.
    • Although, it could be ok to switch this to a soft delete?
  3. Publish a form

    • Possibly makes a new form_def (if the version needs to change)
    • Otherwise sets the currentDefId on the Form to the id of this def (and removes its draft token)
    • This can leave an orphan draft def behind if the version changed...
    • Clear the draft submissions for that form
    • Don't want to lose data here
  4. Update a form draft

    • Makes a new draft form def
    • Clears draft submissions for that form
    • Clears unattached/orphan draft form defs for that form
    • Definitely don't want to lose data here

Database things

The code was originally set up to NOT delete these things, so we could NOT call clearDraftSubmissions and clearUnneededDrafts in these most problematic cases. But there would be no path to clearing up this stale data later.

How would we go about cleaning up this stale data later? The form def / draft submission hierarchy isn't set up to make this too easy.

idea?: We could possibly soft-delete the draft submissions themselves for a given form (and then the submission purge task would come and clean them up in thirty days) and we could purge any draft form_defs that 1) aren't linked to a form as its current draft and 2) don't have any remaining submissions? Instead of calling clearDraftSubmissions(formId) in certain places, we could call softDeleteDraftSubmissions(formId)? and remove the call to clearUnneededDrafts(formId).

A possible benefit (i think) is if you did this, you could poke at the database (before the subs got purged) to set an old draft def to be the active draft def and undelete the submissions and see them again??

matthew-white commented 1 week ago

idea?: We could possibly soft-delete the draft submissions themselves for a given form (and then the submission purge task would come and clean them up in thirty days) and we could purge any draft form_defs that 1) aren't linked to a form as its current draft and 2) don't have any remaining submissions?

We discussed this idea on a call. It makes a lot of sense to me.

  1. Setting managed encryption on a project

    • Purges all draft submissions in that project, purges all unattached draft form defs in the project
    • I think this behavior doesn't need to change? If you're encrypting your project and your draft form and submissions get removed in the process, do you really need to be able to recover them? I.e. has anyone run into this flavor of the issue?
  2. Abandon/delete a form draft

    • Purges unattached draft form defs for that form and clears draft submissions for that form
    • This seems kind of like the desired behavior, too? If you intentionally delete your draft, you want the submission to go away, too.
    • Although, it could be ok to switch this to a soft delete?

As far as I know in these two cases, there's no technical reason why we need to purge the submissions immediately rather than soft-delete them. Instead, maybe things would be simpler as a whole if we discarded draft submissions in the same way in every case (i.e., via soft deletion). For example, if we continued to call Submissions.clearDraftSubmissions() in these two cases, that method would need to be modified to not delete draft submissions that are soft-deleted. (It would immediately purge a submission only if the submission isn't already soft-deleted.)

That said, if it's simpler to handle these cases separately, I think that would also work. @lognaturel, let us know if you've heard about users losing submissions in either of these two cases.

  1. Publish a form

    • This can leave an orphan draft def behind if the version changed...

Oh interesting. So there can be orphaned form defs even today? It'd definitely be nice to clean those up. One thing I like about your idea above is that each scenario doesn't need to have its own logic for deleting orphaned form defs. Instead, orphan form defs will get purged on a regular basis by the centralized purge mechanism.

If you deleted the form_def, you could get rid of the descendent submission_defs but still have the top level submission

It looks like we previously encountered the case of a submission without a submission def as the root cause behind getodk/central-backend#911. It makes sense to me that your idea above will purge submissions first (including the logical submission) and only then go on to purge orphaned form defs that no longer have submissions.

The code was originally set up to NOT delete these things, so we could NOT call clearDraftSubmissions and clearUnneededDrafts in these most problematic cases.

I think that's true at least of orphaned form defs: we used to allow orphaned form defs to persist in some (all?) cases. It sounds like you've identified a case even today where a form def can become orphaned. Given that, I bet things will continue working properly if we stop immediately purging orphaned form defs and allow them to persist for 30 days.

I'm less sure that there's ever been a time when we didn't immediately purge draft submissions (except by accident in #911). However, I also don't think there are many queries that have to do with draft submissions exclusively and not also non-draft submissions. Any query that can retrieve non-draft submissions should already know how to handle soft-deleted submissions. It might not be a bad idea to check queries that reference submissions.draft to make sure that they filter on submissions."deletedAt" as they should.