getodk / collect

ODK Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments around the world. Contribute and make the world a better place! ✨📋✨
https://docs.getodk.org/collect-intro
Other
719 stars 1.37k forks source link

Delete local entities that have been deleted/archived/unassigned from server #6011

Closed seadowg closed 5 months ago

seadowg commented 7 months ago

Blocked by #6029

User stories

As a project administrator If I delete, archive or re-assign an entity, I expect it to be removed from offline clients' devices when they next update So that the entities they see reflect the current state of the project

As a data collector If I create and/or update an entity while offline, I expect it to remain available on my device until it's explicitly deleted, archived or re-assigned away from me

Context

When an entity is deleted on the server, it is not included in subsequent entity list updates. Currently, if an entity is created locally and then deleted on the server, that entity remains available in forms.

There are a few related scenarios:

  1. The entity was created locally, came back to the device in an entity list update, was deleted on server: Entity is created/updated locally and then the relevant submissions are sent to the server, the device syncs and receives the new entity list (with the local entity duplicated), the entity is deleted on the server and then the device syncs and receives the new entity list again (without the local entity)
  2. The entity was created locally and deleted on the server before it was included in an entity list update: The entity is created/updated locally and then the relevant submissions are sent to the server, the entity is then deleted on the server and then the device syncs and receives the new entity list (without the local entity)
  3. The entity was created locally and an entity list update came in before that new entity was processed/approved: The entity is created/updated locally and then the relevant submissions are sent to the server. The server doesn't process the entity creation before the client requests a new entity list so the response doesn't include the local entity.

Scenarios 2 and 3 look identical to clients: they have a local entity that is not in the entity list update.

For scenario 2, we want the local entity deleted because it was deleted on the server.

For scenario 3, we want the local entity kept because it hasn't yet existed on the server.

This issue is for addressing scenario 1 only: if the last update to an entity was from the server, not local-only, and an entity list update does not include that entity, then the entity should be removed from the local working copy.

Acceptance

Questions

  1. Is rejected the same as deleted?
    • Yes! Archive and unassign will also be treated the same when those are implemented
  2. Are updates to deleted entities rejected by the server or will it "recreate" the entity?
    • It does not recreate the entity. It logs a failed attempt to update a non-existing entity.
  3. Is there any reason that we need to know that a secondary instance is an entity list before we create/update locally? I'm not sure we actually need the server to indicate this to solve anything here.

Notes

Brainstorming notes for related/upcoming cases

We'll need new API interactions with the server so that we can handle deletion automatically without needing a manual intervention for clean up ("And I can manually delete the entity locally somehow"), but this will be discussed elsewhere. Implementing this as-is will help us understand the needs there and also help us understand better what kind of storage we want to use for entities on device.

seadowg commented 7 months ago

@lognaturel we'll need to answer question 2 (see the section in the issue description) here before I can write up acceptance for the update scenarios. I think that'll dictate if we need the client to know something's an entity list or not. I'm assuming the answer to question 1 is "yes" (from the clients perspective), but thought it would be good to call out.

lognaturel commented 7 months ago

Both questions now answered!

I've also reworked the issue a little bit to make it clearly about the first scenario only, hopefully this matches what you had in mind. I agree with you that what this points to is making the distinction between entities whose latest version came from the server and entities whose latest version is local-only ("dirty").

Introducing this concept of "known by the server" vs. "local-only" would give us the ability to simply delete entities "known by the server" that are not in the latest entity list update. Another way to think about this would be to keep track of entities that are in a "dirty" state vs. a match for the central source of truth.

An alternative would be to address all cases at once by e.g. querying the server for all entities that are on device but not in the entity list. The advantage of always asking about all entities on device not in the entity list would be that there'd only be one way to handle deletion. Some disadvantages might be that we can imagine scenarios in which a lot of entities might be deleted in one go (especially without assignment/segmenting/tasking) and that we would have to implement the full case at once (whereas with this "dirty" concept we can probably do an initial release that doesn't clean up "dirty" entities and only deals with the easy scenario 1 case).

@seadowg did you explicitly consider handling all cases the same way?

And I can manually delete the entity locally somehow

We won't want local delete in the long term. Are you thinking that without it it will be too confusing to use the entity browser? I think we could consider something more like a blanket "clear all local entities" instead.

I'll give some other thoughts with the assumption we're moving forward with a notion of "dirty" entities.

Given I've updated an entity locally When the entity is deleted on the server And the entity list is then updated Then ?

We have two options:

  1. keep track of the fact that the entity was previously seen on the server and only the update is local in which case we can delete the local copy
  2. only keep track of the fact that the current version of the entity is "dirty" in which case we keep the local copy (the version hasn't "graduated" to being online)

I think the second option is more aligned with a narrower view of this issue. This case feels similar to scenario 2 -- it's not likely to be hugely common.

Given I've updated an entity locally that was not in the entity list When the entity was deleted on the server previously And the entity list is then updated Then ?

This is something like I scan a barcode that has an entityid in it (uuid from Central) in a form that applies an update to the entity with that entityid but I don't have that entity on my device? Ideally, Collect's local entity list is not updated in that case. I think that would be most consistent with the intent of "update" in the case of a non-existent entity. The submission would still be saved and sent to the server where an error would be logged when trying to apply the update to a non-existent entity.

I do think it would be ok for now if it is in the local entity representation to consider the entity "dirty" and treat it the same way (keep the local version).

lognaturel commented 7 months ago

We know we want to move to requesting progressive updates rather than requesting the whole entity list. In that context, we won’t be able to use the “if it’s not in the returned list, delete it” concept. I think the best we’ll be able to do is query the server about all local-only entities to see whether any need to be deleted.

That suggests to me that we should have a single approach for all the cases.

seadowg commented 7 months ago

An alternative would be to address all cases at once by e.g. querying the server for all entities that are on device but not in the entity list. The advantage of always asking about all entities on device not in the entity list would be that there'd only be one way to handle deletion. Some disadvantages might be that we can imagine scenarios in which a lot of entities might be deleted in one go (especially without assignment/segmenting/tasking) and that we would have to implement the full case at once (whereas with this "dirty" concept we can probably do an initial release that doesn't clean up "dirty" entities and only deals with the easy scenario 1 case).

I'm thinking that this issue cheats and just handles the "dirty but never appears in a list case" with a manual deletion and that we then introduce an API to handle that "cheat". I'd want to hold a full release until the former is implemented though.

We won't want local delete in the long term. Are you thinking that without it it will be too confusing to use the entity browser? I think we could consider something more like a blanket "clear all local entities" instead.

Totally agree! I've actually already added that over in https://github.com/getodk/collect/pull/5982 as I figured I might want it during the research trip. I think that satisfies the acceptance and keeps it very obvious that we need an API to solve it properly.

We have two options:

  1. keep track of the fact that the entity was previously seen on the server and only the update is local in which case we can delete the local copy
  2. only keep track of the fact that the current version of the entity is "dirty" in which case we keep the local copy (the version hasn't "graduated" to being online)

I think the second option is more aligned with a narrower view of this issue. This case feels similar to scenario 2 -- it's not likely to be hugely common.

I'm actually thinking that the last line would be Then the entity does not appear in follow up forms. As you say, we can track that the entity was "online" before, so we know it's now been deleted.

This is something like I scan a barcode that has an entityid in it (uuid from Central) in a form that applies an update to the entity with that entityid but I don't have that entity on my device? Ideally, Collect's local entity list is not updated in that case. I think that would be most consistent with the intent of "update" in the case of a non-existent entity. The submission would still be saved and sent to the server where an error would be logged when trying to apply the update to a non-existent entity.

Right. I'll capture another issue for handling the "update to an entity not in the list" case.

I do think it would be ok for now if it is in the local entity representation to consider the entity "dirty" and treat it the same way (keep the local version).

Yeah we could split out another issue for the rarer cases. I quite like having this all as one thing though as I'm pretty sure a lot of how we handle all this will dictate the storage structure, so it's nice to deal with as one block (or feels that way in my head anyway).

lognaturel commented 7 months ago

I'd want to hold a full release until the former is implemented though.

You mean the latter, right? Until the API to handle the more complex cases is added? That sounds right to me.

I've actually already added that over in https://github.com/getodk/collect/pull/5982

🎉

we can track that the entity was "online" before, so we know it's now been deleted.

Somewhat related to https://github.com/getodk/collect/issues/6012#issuecomment-2018515911. For the requirements here so far we'd only need to know that an entity had been previously seen on the server which could conceptually be binary state. But it could also use a concept like keeping a representation of both like we're driving out in #6012.

I quite like having this all as one thing though as I'm pretty sure a lot of how we handle all this will dictate the storage structure, so it's nice to deal with as one block

I see what you mean. Where it gets confusing is that there are criteria like the "And I can manually delete the entity locally somehow" that are for a shorter-term version. I almost want a parent issue with the description of all the needs, a child issue for the short term solution, and a sibling for the long term solution. We can probably worry about that once we start implementation.

I set the title to "Delete local entities that have been deleted/archived/unassigned from server" which I think focuses on the general problem. Please update if it's not matching what you think it's about!

seadowg commented 7 months ago

You mean the latter, right? Until the API to handle the more complex cases is added? That sounds right to me.

Ha yes sorry.

Where it gets confusing is that there are criteria like the "And I can manually delete the entity locally somehow" that are for a shorter-term version. I almost want a parent issue with the description of all the needs, a child issue for the short term solution, and a sibling for the long term solution. We can probably worry about that once we start implementation.

Right. I think we're approaching from slightly different angles here though. I'm seeing these issues from a "I can't change the server" perspective rather than thinking of the work here as "full stack" (and then separating out a pitch for new APIs). I originally a note about that in the issue description, but I think it got lost when we copied in the extra stuff from our docs. I'll look through the edits and add it back in.