getodk / central

ODK Central is a server that is easy to use, very fast, and stuffed with features that make data collection easier. Contribute and make the world a better place! ✨🗄✨
https://docs.getodk.org/central-intro/
Apache License 2.0
126 stars 155 forks source link

Update conflict logic for offline entities #698

Closed matthew-white closed 3 weeks ago

matthew-white commented 2 months ago

Problem description

In #669, I thought that offline entities wouldn't requires us to change much (if anything) about the conflict property. Specifically, I thought:

When Central processes an entity update with a run index > 1 [branchBaseVersion > trunkVersion], then the conflict type of the resulting entity version is null if the current server version of the entity is the prior local version.

However, it's looking like that isn't the ideal behavior after all. Even if the current server version is the previous version from the branch, the update being processed may conflict with a separate update between the trunk version and the previous version. For example:

As things are currently implemented, v3 will be marked as a soft conflict because there was an update between it and its base version. However, v4 is not marked as a conflict because it immediately follows its base version (the previous version from the branch). There are a few issues that stem from the fact that v4 is not marked as a conflict:

Expected behavior

As I wrote on Slack:

[A]n offline update should be considered a conflict whenever it’s not contiguous with the trunk version, even if it immediately follows its base version.

In the example above, v3 is not contiguous with the trunk version (v1), because v2 is between the two. The same is true of v4. This rule would cause v4 to be marked as a conflict.

I think this is a good rule, though I am slightly concerned about the ramifications. I would guess that there are places in Backend and/or Frontend where we consider something a conflict if and only if it does not immediately follow its base version (version !== baseVersion + 1). If we implement the proposed rule, that logic won't be true anymore. If anything needs to change in this area about Frontend, I think I could do so quickly.

I also wrote the following on Slack, but I no longer think that it's always true:

[I]f one offline update in a branch is a conflict, then all updates that follow it in the branch should also be considered in conflict, as they too were made in parallel with an update from another source.

I do think this is true if the updates are processed in order. However, if an update is force-processed out of order, it may be a conflict even if an update that follows it is not a conflict. For example:

In this example, even though v3 and v4 are from the same branch, and v3 is a conflict, we do not expect v4 to be a conflict. v2, v3, and v4 are all contiguous with the trunk version (v1): there's no conflict with some interweaving update from another source, as in the first example above. To summarize: if the versions from an offline branch were force-processed out of order, then even if all the versions are contiguous with their trunk version, it may be the case that an earlier version from the branch is a conflict while a later version is not.

So when evaluating whether an offline update being processed is a conflict, I think the question should be, "Are there any versions after the trunk version that are from outside the branch?" The question should not be, "Was the previous version in the branch a conflict?"

In terms of figuring out whether the conflict is soft or hard (which properties are conflicting, if any), I'm not so sure about the logic. We can't just compare the offline update being processed against its trunk version, because if two updates from the same branch change the same property and are processed in the correct order, there should not be a conflict: in that case, everything happened in sequence, not in parallel. It's only if there was an update from outside the branch that changed the same property as an update from the branch that there should be a hard conflict, as in the first example above.

matthew-white commented 2 months ago

In terms of figuring out whether the conflict is soft or hard (which properties are conflicting, if any), I'm not so sure about the logic.

Even if we can't figure out this part, I personally think it'd be an improvement from the status quo to mark these cases as soft conflicts, not even worrying about calculating the conflicting properties. As long as we mark an offline update that is not contiguous with its trunk version as a conflict, the properties it specified will appear in the conflict summary table. The update will appear as a conflict in the feed. All in all, the user will have a better understanding of what they need to review.

matthew-white commented 2 months ago

I think this is a good rule, though I am slightly concerned about the ramifications. I would guess that there are places in Backend and/or Frontend where we consider something a conflict if and only if it does not immediately follow its base version (version !== baseVersion + 1). If we implement the proposed rule, that logic won't be true anymore. If anything needs to change in this area about Frontend, I think I could do so quickly.

I glanced through Frontend, and I didn't see anything that obviously needed to change. Mostly the code just looks at the conflict property and doesn't compare a version number to the base version. I searched the codebase for the following strings to try to find logic that needed to change:

baseVersion
ersion + 1
ersion - 1
conflict

I may have missed something, but if so, hopefully we'll spot it while testing this issue.

matthew-white commented 2 weeks ago

@getodk/testers, we still need to do some follow-up work related to this issue, so it's not quite ready for testing. I can let you know once that follow-up work is done.