Implementation flaw - Githubissues

apple / sample-cloudkit-sync-engine

MIT License

176 stars 13 forks source link

Implementation flaw #5

Open jmjauer opened 1 year ago

jmjauer commented 1 year ago

Hi there,

I think there is an implementation error in the sample project - more specifically, a flaw in the design of CKSyncEngine. In the SyncedDatabase file, line 350:

self.syncEngine.state.add(pendingRecordZoneChanges: pendingSaves)

This tells the syncEngine what to do when the time is right. Since this piece of information is not persisted in this call (I think its persisted in .stateUpdate(let event)), there is a chance that this information is lost if the app crashes between .add(pendingRecordZoneChanges: pendingSaves) and .stateUpdate(let event).

I know this is just a sample project, but I really think this is a flaw in the API design of CKSyncEngine since I do not see an easy way to guarantee correct behavior. A simple fix would be to make .add(pendingRecordZoneChanges: pendingSaves) (and similar methods) async and only complete after .stateUpdate(let event) was called.

What do you think?

lukaskubanek commented 1 year ago

Isn’t this the same as with the synchronized data you should be saving on your own? When you fail to save the received changes, there’s a risk of losing them.

I approach it in a way where I persist a representation of the pending changes alongside the synced data. Whenever a save succeeds, as indicated by the CKSyncEngine.Event.SentRecordZoneChanges event, I remove the related pending changes from the persisted data. This means that when the app terminates while there are pending changes registered, they will be loaded and re-added upon the next launch.

By the way, I’m only referring to the design of CKSyncEngine, not this sample project, which, as far as I understand, is only for demonstrative purposes and doesn’t handle all edge cases.

marcomasser commented 1 year ago

I agree: Talking about CKSyncEngine, not this sample project, I understand the intended use for CKSyncEngine is to sync your data and that’s it. You have to do persistence yourself anyway and adding a bit of information on whether something has been synced should be part of that persistence layer. So if you know you need to sync something, you mark it as such in the persistence layer, hand it off to CKSyncEngine and when that reports that it has synced that piece of information, then you mark it as such in the persistence layer. If the app crashes before that last step happens, your persistence layer should give that piece of information to CKSyncEngine again the next time.

In other words: I wouldn’t rely on CKSyncEngine.State alone for persisting the sync state of my data model.

jmjauer commented 1 year ago

Isn’t this the same as with the synchronized data you should be saving on your own? When you fail to save the received changes, there’s a risk of losing them.

Do you mean the delegate callback when CKSynEngine fetched changes?

I approach it in a way where I persist a representation of the pending changes alongside the synced data. Whenever a save succeeds, as indicated by the CKSyncEngine.Event.SentRecordZoneChanges event, I remove the related pending changes from the persisted data. This means that when the app terminates while there are pending changes registered, they will be loaded and re-added upon the next launch.

I have thought about this, but it is very easy to get it wrong: An unfortunate sequence of events could result in your representation of your pending changes removing changes that were triggered by other pending changes. There must be a way to associate the CKSyncEngine.Event.SentRecordZoneChanges callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.

By the way, I’m only referring to the design of CKSyncEngine, not this sample project, which, as far as I understand, is only for demonstrative purposes and doesn’t handle all edge cases.

jmjauer commented 1 year ago

I agree: Talking about CKSyncEngine, not this sample project, I understand the intended use for CKSyncEngine is to sync your data and that’s it. You have to do persistence yourself anyway and adding a bit of information on whether something has been synced should be part of that persistence layer.

But CKSyncEngine already manages state for this very reason - and we need to persist that state. I think this is a great design, but the way it is implemented makes it very hard to implement correct behaviour. I know it's just a sample app, but I think it would be much better to use the sample app to show how to use CKSynEngine for these cases. I'm pretty sure most developers just assume that when they call .add(pendingRecordZoneChanges: pendingSaves) the sync engine will do the rest - but it doesn't (for edge cases).

So if you know you need to sync something, you mark it as such in the persistence layer, hand it off to CKSyncEngine and when that reports that it has synced that piece of information, then you mark it as such in the persistence layer. If the app crashes before that last step happens, your persistence layer should give that piece of information to CKSyncEngine again the next time. In other words: I wouldn’t rely on CKSyncEngine.State alone for persisting the sync state of my data model.

This is exactly what confuses me. There is an api for this, but it does not handle these edge cases. This type of crash is definitely very unlikely, but if it happens the sync is broken. Also, I don't see an easy way to associate the delegate callback with the changes you've persisted - it's tricky to get that right too. A simple flag won't work in certain event sequences.

marcomasser commented 1 year ago

But CKSyncEngine already manages state for this very reason - and we need to persist that state. […]

I would be cautious to assume anything about the CKSyncEngine.State that you persist. It’s pretty clear that it will contain all the record and zone IDs that it needs to sync, but that doesn’t mean it’s all that you will need.

Anyway, I suspect that there are multiple interpretations of how to use CKSyncEngine floating around in this issue here. It might be worth it to file a documentation feedback with Apple for clarification.

lukaskubanek commented 1 year ago

Do you mean the delegate callback when CKSynEngine fetched changes?

Exactly. If you fail to integrate these into your persistence layer, you’ll lose them.

An unfortunate sequence of events could result in your representation of your pending changes removing changes that were triggered by other pending changes.

I don’t really understand what you mean by this. Can you elaborate?

There must be a way to associate the CKSyncEngine.Event.SentRecordZoneChanges callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.

Currently, in my usage of CKSyncEngine, I’m relying on the kind (save or deletion) and the associated record ID for the associations and I work with the latest versions of the changed records.

However, after thinking about it for a moment, there indeed might be a problematic sequence:

You make a change to a record and store the updated record and the pending change in your persistence layer.
You signal the pending change to the sync engine.
The sync engine starts sending changes (CKSyncEngine.Event.WillSendChanges).
The sync engine loads the record for the pending change (CKSyncEngineDelegate.nextRecordZoneChangeBatch(_:syncEngine:)).
You make another change to the same record and store the updated record and the pending change in your persistence layer.
The sync engine reports sent changes (CKSyncEngine.Event.SentRecordZoneChanges).
You remove the reported pending change from your persistence layer.
The sync engine finishes sending changes (CKSyncEngine.Event.DidSendChanges).

In this case, the second change made in step 5 would be ignored. I guess the safest way would probably be to store pending changes into a secondary bucket after receiving the sync engine starts sending (step 3). Then, when it finishes (step 8), you’d move them to the primary bucket and signal changes to the CKSyncEngine again.

By the way, I also wanted to point out that you don’t have to use CKSyncEngine.State’s add(pendingDatabaseChanges:) and remove(pendingDatabaseChanges:) mechanism for managing the pending changes. CKSyncEngine.State also has the hasPendingUntrackedChanges property for signaling that there are pending changes. You can then provide them right from your persistence layer in the appropriate delegate method. This is the path I chose, as you’d have to keep them in sync otherwise. And that would only ask for troubles…

@jmjauer Is this the sequence you had in mind?

marcomasser commented 1 year ago

I’d say there’s one step missing in your sequence of events: After step 5, you’d have to signal the sync engine again that there’s a change to sync. It would then decide on when to do that by itself. This also means your persistence layer’s representation of a pending change would need to be rich enough to handle the case that there might be multiple pending sync operations for one single record. So not just a boolean but something kind of like a semaphore that counts up for pending changes and down for completed syncs.

jmjauer commented 1 year ago

Do you mean the delegate callback when CKSynEngine fetched changes?

Exactly. If you fail to integrate these into your persistence layer, you’ll lose them.

Yes, this is another problem I don't know how to fix. Probably delete the sync state and restart the sync.

An unfortunate sequence of events could result in your representation of your pending changes removing changes that were triggered by other pending changes.

I don’t really understand what you mean by this. Can you elaborate?

There must be a way to associate the CKSyncEngine.Event.SentRecordZoneChanges callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.

Currently, in my usage of CKSyncEngine, I’m relying on the kind (save or deletion) and the associated record ID for the associations and I work with the latest versions of the changed records.

However, after thinking about it for a moment, there indeed might be a problematic sequence:

You make a change to a record and store the updated record and the pending change in your persistence layer.

You signal the pending change to the sync engine.

The sync engine starts sending changes (CKSyncEngine.Event.WillSendChanges).

The sync engine loads the record for the pending change (CKSyncEngineDelegate.nextRecordZoneChangeBatch(_:syncEngine:)).

You make another change to the same record and store the updated record and the pending change in your persistence layer.

The sync engine reports sent changes (CKSyncEngine.Event.SentRecordZoneChanges).

You remove the reported pending change from your persistence layer.

The sync engine finishes sending changes (CKSyncEngine.Event.DidSendChanges).

In this case, the second change made in step 5 would be ignored. I guess the safest way would probably be to store pending changes into a secondary bucket after receiving the sync engine starts sending (step 3). Then, when it finishes (step 8), you’d move them to the primary bucket and signal changes to the CKSyncEngine again.

By the way, I also wanted to point out that you don’t have to use CKSyncEngine.State’s add(pendingDatabaseChanges:) and remove(pendingDatabaseChanges:) mechanism for managing the pending changes. CKSyncEngine.State also has the hasPendingUntrackedChanges property for signaling that there are pending changes. You can then provide them right from your persistence layer in the appropriate delegate method. This is the path I chose, as you’d have to keep them in sync otherwise. And that would only ask for troubles…

Can you explain that a little bit more?

@jmjauer Is this the sequence you had in mind?

Yes, indeed

jmjauer commented 1 year ago

I’d say there’s one step missing in your sequence of events: After step 5, you’d have to signal the sync engine again that there’s a change to sync. It would then decide on when to do that by itself. This also means your persistence layer’s representation of a pending change would need to be rich enough to handle the case that there might be multiple pending sync operations for one single record. So not just a boolean but something kind of like a semaphore that counts up for pending changes and down for completed syncs.

I also thought about this, but I think there are no guarantees for the number of callbacks. And if the number of callbacks is less than the number of addPending calls, this will not work.

timimahoney commented 10 months ago

Great discussion here!

You're right that there's a potential problem if the process crashes in between a call to add(pending...) and the StateUpdate event. CKSyncEngine coalesces state update events under the hood in order to prevent unnecessary extra persistence. Unfortunately, that coalescing results in this issue you're describing here. Fortunately, this delay is around 100 milliseconds, so the chance of this happening is low.

That said, this is definitely a real issue. We can track this internally and figure out a resolution, but it would hold more weight if we could get some of your opinions and perspectives as well. The more people we see who care about the issue, the more urgently we can prioritize it. Would you mind submitting a feedback about this?

To address some specific points:

There must be a way to associate the CKSyncEngine.Event.SentRecordZoneChanges callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.

Associating the data in SentRecordZoneChanges with your own local persistence layer is possible, but it might be tricky. There's always a chance that the user modifies some data while the previous version of that data is being sent to the server. It might be easy enough if you keep some sort of version/timestamp in your local persistence that you can compare to the saved record in SentRecordZoneChanges. I'm not sure if there's a way CKSyncEngine can provide something like this in a generic way, but we're always open to suggestions.

Exactly. If you fail to integrate these into your persistence layer, you’ll lose them.

Fortunately, the fetching side of this doesn't have the same issue. CKSyncEngine won't persist the updated server change tokens in its state until you've already handled the FetchedRecordZoneChanges event. As a result, if you crash in between fetching changes and persisting the state, you should be fine the next time you launch. CKSyncEngine will sync from the previous change token and re-deliver all the records that weren't persisted.

You make another change to the same record and store the updated record and the pending change in your persistence layer. ... In this case, the second change made in step 5 would be ignored.

If you're using add(pendingRecordZoneChanges:), then CKSyncEngine already accounts for this possibility. Internally, it tracks the concept of an "in-flight" change. If you add a pending change that matches an in-flight change, then CKSyncEngine will make sure it keeps that pending change even after the previous one finished. For example:

You add a pending save for your record.
CKSyncEngine asks you for the record, starts to save it, and marks it as in-flight.
While the save is in flight, you add a pending save for the same record.
CKSyncEngine finishes saving the record and gives you a SendRecordZoneChanges event.

Since you told the sync engine that you have a new pending save in step 3 while the previous change was in flight, it will keep that pending change even after step 4.

If you're using hasPendingUntrackedChanges, CKSyncEngine also has a notion of "in-flight untracked changes" and will do the same thing for the overall state of "untracked" changes. However, since you're not tracking the individual records in the state, you'd be in charge of tracking the in-flight status of these individual records.

handle the case that there might be multiple pending sync operations for one single record

CKSyncEngine only saves one batch of changes at a time, so thankfully this wouldn't be a problem (unless I'm misunderstanding what you meant).

Hopefully that clears some stuff up. Again, this is a great discussion, and we're happy to keep it going until we get to the bottom of everything. Also, sorry for the delay on responding to this!

jmjauer commented 10 months ago

Hey Tim, thanks for joining the discussion!

That said, this is definitely a real issue. We can track this internally and figure out a resolution, but it would hold more weight if we could get some of your opinions and perspectives as well. The more people we see who care about the issue, the more urgently we can prioritize it. Would you mind submitting a feedback about this?

I will submit feedback on this issue. I think a simple solution would be to add async versions of add(pending...) that return after the state is persisted.

Associating the data in SentRecordZoneChanges with your own local persistence layer is possible, but it might be tricky. There's always a chance that the user modifies some data while the previous version of that data is being sent to the server. It might be easy enough if you keep some sort of version/timestamp in your local persistence that you can compare to the saved record in SentRecordZoneChanges. I'm not sure if there's a way CKSyncEngine can provide something like this in a generic way, but we're always open to suggestions.

Maybe I'm missing something, but wouldn't it be relatively easy for CKSyncEngine to provide some kind of change token when changes are added, and a change token in SentRecordZoneChanges? For example, this change token could be provided by CKSyncEngine.SendChangesContext in nextRecordZoneChangeBatch when CKSyncEngines reads the records and in SentRecordZoneChanges when the sync is done.

If you're using hasPendingUntrackedChanges, CKSyncEngine also has a notion of "in-flight untracked changes" and will do the same thing for the overall state of "untracked" changes. However, since you're not tracking the individual records in the state, you'd be in charge of tracking the in-flight status of these individual records.

This is really interesting - can you explain how CKSyncEngine tracks this in-flight status for each record when using add(pendingRecordZoneChanges:)?

Overall, I think CKSyncEngine is a great addition to CloudKit, but it is hard to understand the edge cases since they are not expressible via the api. Adding more examples and documentation would help a lot here. It would be great if you could add another sample project that shows how to handle sync state in your own persistence layer (using hasPendingUntrackedChanges).

timimahoney commented 10 months ago

I will submit feedback on this issue. I think a simple solution would be to add async versions of add(pending...) that return after the state is persisted.

❤️ Thanks!

Maybe I'm missing something, but wouldn't it be relatively easy for CKSyncEngine to provide some kind of change token when changes are added, and a change token in SentRecordZoneChanges? For example, this change token could be provided by CKSyncEngine.SendChangesContext in nextRecordZoneChangeBatch when CKSyncEngines reads the records and in SentRecordZoneChanges when the sync is done.

Something like that could potentially work. Or we could even let you put some sort of arbitrary object on RecordZoneChangeBatch (similar to userInfo on NSError), and you could use that to track whatever you want. Would you mind submitting a feedback requesting something like this as well? We'd have to think about this a bit, but it's always worth submitting a suggestion.

This is really interesting - can you explain how CKSyncEngine tracks this in-flight status for each record when using add(pendingRecordZoneChanges:)?

The general flow looks like this, although the exact implementation is subject to change:

Right after receiving the next batch from nextRecordZoneChangeBatch, CKSyncEngine removes the change from the "pending" list and marks it as "in-flight."
CKSyncEngine starts the network request to send the changes to the server.
If you call add(pending...) for a change that is currently in flight, then that change will be both "in-flight" and "pending."
When the server request completes, CKSyncEngine marks the change as no longer in-flight.
If the request failed due to a retryable error (e.g. network failure), it adds the change back as "pending."

The key piece here is steps 3 and 4. CKSyncEngine keeps track of two separate states, "in-flight" and "pending." Even if it marks the change as no longer in flight in step 4, it'll still keep the pending change you added in step 3.

If you're using hasPendingUntrackedChanges, you could simulate this behavior by marking the changes as in-flight inside your implementation of nextRecordZoneChangeBatch, then un-marking as in-flight when handling the SentRecordZoneChangeBatch event.

You could also achieve the same result by tracking some sort of explicit "version" each object. You'd have to know the current version locally as well as the last known version on the server. If the server version is less than the local version, then you know you still have changes to send to the server. This is less "in-flight" tracking and more just tracking whether or not you've saved everything to the server. This version could be as simple as an integer or as complex as a vector clock.

Overall, I think CKSyncEngine is a great addition to CloudKit, but it is hard to understand the edge cases since they are not expressible via the api. Adding more examples and documentation would help a lot here. It would be great if you could add another sample project that shows how to handle sync state in your own persistence layer (using hasPendingUntrackedChanges).

I agree it would be interesting to see a sample project that uses hasPendingUntrackedChanges. For now, we've started small and simple, but perhaps one day we can build a slightly more complex sample app that uses some of the more advanced features like hasPendingUntrackedChanges and nextFetchChangesOptions.

jmjauer commented 10 months ago

❤️ Thanks!

Your welcome! The feedback ids are FB13469853 and FB13469895.

If you're using hasPendingUntrackedChanges, you could simulate this behavior by marking the changes as in-flight inside your implementation of nextRecordZoneChangeBatch, then un-marking as in-flight when handling the SentRecordZoneChangeBatch event.

I will try to implement this for my use case and get back to you with my findings and maybe suggestions for improvement :)

jmjauer commented 9 months ago

@timimahoney

I am currently in the process of implementing sync using hasPendingUntrackedChanges and have encountered the following problem:

hasPendingUntrackedChanges triggers nextRecordZoneChangeBatch(_:syncEngine:), but there is no nextDatabaseChangeBatch(_:syncEngine:). Is it possible to manually track database changes (e.g. creating a zone)?