carekit-apple / CareKit

CareKit is an open source software framework for creating apps that help people better understand and manage their health.
https://www.researchandcare.org
Other
2.4k stars 443 forks source link

Continuation of #558 aka Outcomes cause data corruption #567

Open nizar opened 3 years ago

nizar commented 3 years ago

Hello,

I am still seeing a number of odd things when working with store synchronization and toggling the state of outcomes on a single task/event. This is similar to what was happening with #558. As a test to isolate the issue, I wired up my app to use the SimulatedRemote and SimulatedServer to see if it was my remote sync code. The weird behavior is also happening with the simulated remote/server. The one change I made to the simulated remote was to set automaticallySynchronizes to true, which is the use case for my app.

Before I dive into trying to create a standalone reproducible sample, it would be really helpful if someone could validate the the logic in the SimulatedRemote and SimulatedServer used in OCKStore+Sync tests is valid? Thanks in advance for your help.

https://github.com/carekit-apple/CareKit/blob/716e47f741495abb08099ac7456c4b21c5066a66/CareKitStore/CareKitStoreTests/OCKStore/TestStore%2BSync.swift#L159-L250

erik-apple commented 3 years ago

The SimulatedServer and SimulatedRemote are correct for the purpose of the unit tests they're used in, but may not be fully compliant. Which is to say, I can't guarantee that they work as expected when hooked up to an actual app.

Is the issue that you're seeing the same as before? That toggling the state of an outcome on a single task fails? Does it happen only when a remote is setup, or does it also happen when no remote is used?

nizar commented 3 years ago

Thanks for the quick response @erik-apple. I just pushed a branch on my fork with the changes outlined below to recreate the issue:

  1. Add the SimulatedRemote and corresponding SimulatedServer
  2. Set the automaticallySynchronizes field on SimulatedRemote to true. The issue does not occur if this is not set to true.
  3. Build & Run
  4. Toggle the completion status on the Kegel task 3-4 times.
  5. After the second toggle any additional changes are "reversed" by the store and additional revisions end up with multiple previous/next UUIDs. There is a brief flash of the toggled state changing, but then it changes right back.

Let me know if I can provide any additional details

erik-apple commented 3 years ago

Thanks for providing a branch. I'm able to reproduce on our end, so I can begin looking into the problem.

cbaker6 commented 3 years ago

This might be related to isSynchronizing being removed from CareKit in #553, so there’s nothing in CareKit to stop duplicates being sent to the remote if multiple syncs are requested. One way to account for this is the remote is in pullRevisions and pushRevisions https://github.com/netreconlab/ParseCareKit/blob/e893ce0830239b06986fffce9ee97aa2c5d83e3a/Sources/ParseCareKit/ParseRemote.swift#L157

erik-apple commented 3 years ago

Yeah, multiple copies of the same data being sent is an edge case that is possible now.

Whether or not you deduplicate or prevent multiple syncs at a time is left up to the implementor. You could do something like Corey demonstrated on the iOS side, or you could handle it on the server, or you can choose not to deal with it at all, which is what the simulated server does.

This can result in multiple copies of the same data being delivered back to clients, but that is fine as long as CareKit is holding up its promise to act as a CRDT. It's supposed to have the properties of being Commutative, Idempotent, and Associative. If there is duplicate data, the Idempotent property should guarantee it isn't added into the database more than once.

I haven't had time to dig into this yet, but I suspect it'll probably be something related to the version graph or the effective date property on outcomes!

nizar commented 3 years ago

FWIW, in my project I put a guard around any calls to store.sync() to ensure that there were not multiple concurrent syncs triggered from the app. This didn't make a difference. If I had to guess, I would say that a sync is being triggered due to the automaticallySynchronizes flag in the midst of a mergeRevision which effectively stacks changes. In my half-hearted attempt at debugging this, I noticed that the call sequence was 3 consecutive pullRevisions calls on the remote followed by 3 consecutive pushRevisions. Not sure this helps, but that's as far as I've gotten on this particular issue.

cbaker6 commented 3 years ago

FWIW, in my project I put a guard around any calls to store.sync() to ensure that there were not multiple concurrent syncs triggered from the app.

This would only prevent your app from making multiple requests, but wouldn't have any influence over CareKit telling your remote to synchronize multiple times.

If I had to guess, I would say that a sync is being triggered due to the automaticallySynchronizes flag in the midst of a mergeRevision which effectively stacks changes. In my half-hearted attempt at debugging this, I noticed that the call sequence was 3 consecutive pullRevisions calls on the remote followed by 3 consecutive pushRevisions.

If you have something like I mentioned, it would prevent this from creating duplicates as well. You could also handle it server-side like Erik mentioned.

nizar commented 3 years ago

FWIW, in my project I put a guard around any calls to store.sync() to ensure that there were not multiple concurrent syncs triggered from the app.

This would only prevent your app from making multiple requests, but wouldn't have any influence over CareKit telling your remote to synchronize multiple times.

Understood and this is where if CareKitStore is CDRT then the app should not need to do this. Additionally, adding a guard directly in the remote which requires knowing implementation details that a pull always happens before a push would be difficult to maintain over time.

If I had to guess, I would say that a sync is being triggered due to the automaticallySynchronizes flag in the midst of a mergeRevision which effectively stacks changes. In my half-hearted attempt at debugging this, I noticed that the call sequence was 3 consecutive pullRevisions calls on the remote followed by 3 consecutive pushRevisions.

If you have something like I mentioned, it would prevent this from creating duplicates as well. You could also handle it server-side like Erik mentioned.

In my use-case, I'm building this for a serverless environment and that would add a level of complexity that is probably not needed.

erik-apple commented 3 years ago

Understood and this is where if CareKitStore is CDRT then the app should not need to do this. Additionally, adding a guard directly in the remote which requires knowing implementation details that a pull always happens before a push would be difficult to maintain over time.

Yes, this is correct. The intention is that you don't need to concern yourself with duplicates because CareKit handles them gracefully for you.

In my use-case, I'm building this for a serverless environment and that would add a level of complexity that is probably not needed.

Separately from this bug, I'm curious how you're handling pulls in a serverless environment. Are you able to do a targeted fetch of revisions based on the knowledge vector?

nizar commented 3 years ago

In my use-case, I'm building this for a serverless environment and that would add a level of complexity that is probably not needed.

Separately from this bug, I'm curious how you're handling pulls in a serverless environment. Are you able to do a targeted fetch of revisions based on the knowledge vector?

I'm currently using Firebase and have not fully migrated to the updated sync from #553 as yet. The way I am currently handling it in the pre-553 model is that the cloud has it's own (UUID:Int) that I store as a destructured record. On every pullRevision, I do a query to fetch all entities where entity.cloudClock >= deviceKnowledge.clock(for: cloudClock.uuid). I return that with the cloudClock merged into the deviceKnowledge. On a pushRevision, the cloudClock is incremented and entities are stamped with that clock. Because I stamp on each entity is just an Int, the query is pretty efficient.

I will likely migrate to to something like AWS Amplify which provides a GraphQL interface in a serverless environment. Ideally, I'd like to simply have a GQL OCKStore similar to the Core Data store. That's a bit out of scope at the moment but if there are other folks interested in something like that I think it would be fun.

marco-theraforge commented 3 years ago

I will likely migrate to to something like AWS Amplify which provides a GraphQL interface in a serverless environment. Ideally, I'd like to simply have a GQL OCKStore similar to the Core Data store. That's a bit out of scope at the moment but if there are other folks interested in something like that I think it would be fun.

Hi Nizar,

we are working on a cloud-connected OCKStore in a serverless environment running on AWS. It has a full-fledged sync protocol.

If you'd like to have a chat to see if we can help, my contact info is marco@hippocratestech.com.

Marco

erik-apple commented 3 years ago

I've pinned down what's going on here.

After each merge, CareKit attempts to detect and resolve logical inconsistencies that may have arisen as part of the merge. Namely, if there is more than one entry in an object version chain that appears to be the newest version (the tip), we consider that a conflict and ask the remote which is truly supposed to be the tip.

If you add an object, then delete the object, then add another object with the same id, the result is 2 chains containing 3 versions and 2 tips (one deleted and one not).

ChainA: V1 —> V2(deleted)

ChainB: V3

At the conclusion of the subsequent sync, when we attempt to detect conflicts, V2 and V3 appear to be conflicts since both have no next version (are tips) and the same knowledge vector. Since we created both locally on the same device, it’s clear that in this situation V3 should be the true tip and V2 should not be counted as a conflict, but had ChainA and ChainB come from different devices, we would likely prefer it be treated as a conflict.

While I'm working on a root solution, you can put a workaround in your implementation of chooseConflictResolution(conflicts:completion:). The workaround is to check if the conflicts are a delete and an add for an outcome, and to choose the add over the delete.

    public func chooseConflictResolution(
        conflicts: [OCKEntity], completion: @escaping OCKResultClosure<OCKEntity>) {

        // https://github.com/carekit-apple/CareKit/issues/567
        // Workaround to handle deleted and re-added outcomes.
        // Always prefer updates over deletes.
        let outcomes = conflicts.compactMap { conflict -> OCKOutcome? in
            if case let .outcome(outcome) = conflict {
                return outcome
            } else {
                return nil
            }
        }

        if outcomes.count == 2,
           outcomes.contains(where: { $0.deletedDate != nil}),
           let added = outcomes.first(where: { $0.deletedDate == nil}) {

            completion(.success(.outcome(added)))
            return
        }

        // Logic for other scenarios...
    }
nizar commented 3 years ago

Thanks @erik-apple. I'll give this a try over the weekend. I appreciate the quick review and response.

erik-apple commented 3 years ago

We appreciate you reporting it, and especially that you included a way for us to reproduce it!