getodk / central

ODK Central is a server that is easy to use, very fast, and stuffed with features that make data collection easier. Contribute and make the world a better place! ✨🗄✨
https://docs.getodk.org/central-intro/
Apache License 2.0
126 stars 155 forks source link

Add a task to process offline entity submissions that have been held for a certain amount of time #682

Closed ktuite closed 2 months ago

ktuite commented 4 months ago

If submissions about offline entity updates come in out of order, they will be put in a holding queue until the preceding submission comes in. It is possible for some submissions to get stuck indefinitely if an earlier submission in a branch gets lost. Add a mechanism where these submissions can be processed anyway.

If an entity update is applied from the queue, but the entity has not been created yet, then Central should create the entity. That can happen if the entity was created as part of the offline run, but the first change from the run has not been processed yet. If the entity update does not specify a label, then either the UUID or a fixed label (e.g., "Label unknown") should be used.

If dataset is configured for approvalRequired of submissions, then an entity update that is force applied should never result in a new entity. Applying an update-as-create should not happen in this case.

Time frame? 1 day, 3 days, 7 days? Make configurable (for QA specifically).

ktuite commented 4 months ago

My plan is to make a task in tasks/offline-entities.js called applyHeldSubmissions. This will sort the held submissions by branchId and baseVersion and apply actions from the same branch in the right order.

This will only operate on updates because submissions that create entities never get held for later. But the update could be applied on an entity that doesn't yet exist. We've talked about creating a new entity in this case.

Do we need to account for create submission that come in way later than the "update-applied-as-create after it was held for a long time" submissions? Run a create-as-update? I think this gets more complicated with our create logic and constraints. I would rather have a super-delayed create submission just get marked as a conflict error.

matthew-white commented 4 months ago

Do we need to account for create submission that come in way later than the "update-applied-as-create after it was held for a long time" submissions? Run a create-as-update?

My instinct would be to run a create-as-update. For example, imagine an offline branch where one form creates an entity, specifying a label, then a second form updates the entity, specifying some other properties. Ideally, we'd get the label from the first submission even if it comes out of order.

I would rather have a super-delayed create submission just get marked as a conflict error.

Just to make sure I understand, the idea is to surface this as an entity processing error on the submission, right? It wouldn't be shown on the entity detail page?

Maybe the right answer depends on how common missing submissions will be. Not very common hopefully, but common enough that we're trying to account for them? I can't think of a reason why an entity-creating submission would be less likely to be missed than an entity-updating one.

matthew-white commented 4 months ago

I guess one question is how likely is it that the entity-creating submission is missing for long enough that the entity-updating submission is processed from the holding queue, but then the entity-creating submission does come in afterwards? Hopefully all submissions do come in eventually, right? I'd be interested in how likely @lognaturel thinks this is.

matthew-white commented 4 months ago

One restriction I think we should set up is that we should only apply an update as a create if the form def specifies the create attribute in the <entity> block (if dataset_form_defs.actions includes create). Normally, the user is only allowed to create an entity via submission if the form def specifies create: the user is restricted to the actions listed in the form def. I think that should continue to be the case for offline updates. Otherwise, a user would be able to circumvent the restriction on creating entities by submitting an offline entity update to an entity that doesn't exist.

If it's easiest, we could check for this even before adding the submission to the backlog. If a submission specifies an offline update to an entity that doesn't exist, and the form def doesn't allow create, then just fail the submission without adding it to the backlog, even if earlier updates in the offline branch have not been received.