Open jmjauer opened 1 year ago
Isn’t this the same as with the synchronized data you should be saving on your own? When you fail to save the received changes, there’s a risk of losing them.
I approach it in a way where I persist a representation of the pending changes alongside the synced data. Whenever a save succeeds, as indicated by the CKSyncEngine.Event.SentRecordZoneChanges
event, I remove the related pending changes from the persisted data. This means that when the app terminates while there are pending changes registered, they will be loaded and re-added upon the next launch.
By the way, I’m only referring to the design of CKSyncEngine
, not this sample project, which, as far as I understand, is only for demonstrative purposes and doesn’t handle all edge cases.
I agree: Talking about CKSyncEngine
, not this sample project, I understand the intended use for CKSyncEngine
is to sync your data and that’s it. You have to do persistence yourself anyway and adding a bit of information on whether something has been synced should be part of that persistence layer. So if you know you need to sync something, you mark it as such in the persistence layer, hand it off to CKSyncEngine
and when that reports that it has synced that piece of information, then you mark it as such in the persistence layer. If the app crashes before that last step happens, your persistence layer should give that piece of information to CKSyncEngine
again the next time.
In other words: I wouldn’t rely on CKSyncEngine.State
alone for persisting the sync state of my data model.
Isn’t this the same as with the synchronized data you should be saving on your own? When you fail to save the received changes, there’s a risk of losing them.
Do you mean the delegate callback when CKSynEngine fetched changes?
I approach it in a way where I persist a representation of the pending changes alongside the synced data. Whenever a save succeeds, as indicated by the
CKSyncEngine.Event.SentRecordZoneChanges
event, I remove the related pending changes from the persisted data. This means that when the app terminates while there are pending changes registered, they will be loaded and re-added upon the next launch.
I have thought about this, but it is very easy to get it wrong: An unfortunate sequence of events could result in your representation of your pending changes removing changes that were triggered by other pending changes. There must be a way to associate the CKSyncEngine.Event.SentRecordZoneChanges callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.
By the way, I’m only referring to the design of
CKSyncEngine
, not this sample project, which, as far as I understand, is only for demonstrative purposes and doesn’t handle all edge cases.
I agree: Talking about
CKSyncEngine
, not this sample project, I understand the intended use forCKSyncEngine
is to sync your data and that’s it. You have to do persistence yourself anyway and adding a bit of information on whether something has been synced should be part of that persistence layer.
But CKSyncEngine already manages state for this very reason - and we need to persist that state. I think this is a great design, but the way it is implemented makes it very hard to implement correct behaviour. I know it's just a sample app, but I think it would be much better to use the sample app to show how to use CKSynEngine for these cases. I'm pretty sure most developers just assume that when they call .add(pendingRecordZoneChanges: pendingSaves) the sync engine will do the rest - but it doesn't (for edge cases).
So if you know you need to sync something, you mark it as such in the persistence layer, hand it off to
CKSyncEngine
and when that reports that it has synced that piece of information, then you mark it as such in the persistence layer. If the app crashes before that last step happens, your persistence layer should give that piece of information toCKSyncEngine
again the next time. In other words: I wouldn’t rely onCKSyncEngine.State
alone for persisting the sync state of my data model.
This is exactly what confuses me. There is an api for this, but it does not handle these edge cases. This type of crash is definitely very unlikely, but if it happens the sync is broken. Also, I don't see an easy way to associate the delegate callback with the changes you've persisted - it's tricky to get that right too. A simple flag won't work in certain event sequences.
But CKSyncEngine already manages state for this very reason - and we need to persist that state. […]
I would be cautious to assume anything about the CKSyncEngine.State
that you persist. It’s pretty clear that it will contain all the record and zone IDs that it needs to sync, but that doesn’t mean it’s all that you will need.
Anyway, I suspect that there are multiple interpretations of how to use CKSyncEngine
floating around in this issue here. It might be worth it to file a documentation feedback with Apple for clarification.
Do you mean the delegate callback when CKSynEngine fetched changes?
Exactly. If you fail to integrate these into your persistence layer, you’ll lose them.
An unfortunate sequence of events could result in your representation of your pending changes removing changes that were triggered by other pending changes.
I don’t really understand what you mean by this. Can you elaborate?
There must be a way to associate the
CKSyncEngine.Event.SentRecordZoneChanges
callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.
Currently, in my usage of CKSyncEngine
, I’m relying on the kind (save or deletion) and the associated record ID for the associations and I work with the latest versions of the changed records.
However, after thinking about it for a moment, there indeed might be a problematic sequence:
CKSyncEngine.Event.WillSendChanges
).CKSyncEngineDelegate.nextRecordZoneChangeBatch(_:syncEngine:)
).CKSyncEngine.Event.SentRecordZoneChanges
).CKSyncEngine.Event.DidSendChanges
).In this case, the second change made in step 5 would be ignored. I guess the safest way would probably be to store pending changes into a secondary bucket after receiving the sync engine starts sending (step 3). Then, when it finishes (step 8), you’d move them to the primary bucket and signal changes to the CKSyncEngine
again.
By the way, I also wanted to point out that you don’t have to use CKSyncEngine.State
’s add(pendingDatabaseChanges:)
and remove(pendingDatabaseChanges:)
mechanism for managing the pending changes. CKSyncEngine.State
also has the hasPendingUntrackedChanges
property for signaling that there are pending changes. You can then provide them right from your persistence layer in the appropriate delegate method. This is the path I chose, as you’d have to keep them in sync otherwise. And that would only ask for troubles…
@jmjauer Is this the sequence you had in mind?
I’d say there’s one step missing in your sequence of events: After step 5, you’d have to signal the sync engine again that there’s a change to sync. It would then decide on when to do that by itself. This also means your persistence layer’s representation of a pending change would need to be rich enough to handle the case that there might be multiple pending sync operations for one single record. So not just a boolean but something kind of like a semaphore that counts up for pending changes and down for completed syncs.
Do you mean the delegate callback when CKSynEngine fetched changes?
Exactly. If you fail to integrate these into your persistence layer, you’ll lose them.
Yes, this is another problem I don't know how to fix. Probably delete the sync state and restart the sync.
An unfortunate sequence of events could result in your representation of your pending changes removing changes that were triggered by other pending changes.
I don’t really understand what you mean by this. Can you elaborate?
There must be a way to associate the
CKSyncEngine.Event.SentRecordZoneChanges
callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.Currently, in my usage of
CKSyncEngine
, I’m relying on the kind (save or deletion) and the associated record ID for the associations and I work with the latest versions of the changed records.However, after thinking about it for a moment, there indeed might be a problematic sequence:
- You make a change to a record and store the updated record and the pending change in your persistence layer.
- You signal the pending change to the sync engine.
- The sync engine starts sending changes (
CKSyncEngine.Event.WillSendChanges
).- The sync engine loads the record for the pending change (
CKSyncEngineDelegate.nextRecordZoneChangeBatch(_:syncEngine:)
).- You make another change to the same record and store the updated record and the pending change in your persistence layer.
- The sync engine reports sent changes (
CKSyncEngine.Event.SentRecordZoneChanges
).- You remove the reported pending change from your persistence layer.
- The sync engine finishes sending changes (
CKSyncEngine.Event.DidSendChanges
).In this case, the second change made in step 5 would be ignored. I guess the safest way would probably be to store pending changes into a secondary bucket after receiving the sync engine starts sending (step 3). Then, when it finishes (step 8), you’d move them to the primary bucket and signal changes to the
CKSyncEngine
again.By the way, I also wanted to point out that you don’t have to use
CKSyncEngine.State
’sadd(pendingDatabaseChanges:)
andremove(pendingDatabaseChanges:)
mechanism for managing the pending changes.CKSyncEngine.State
also has thehasPendingUntrackedChanges
property for signaling that there are pending changes. You can then provide them right from your persistence layer in the appropriate delegate method. This is the path I chose, as you’d have to keep them in sync otherwise. And that would only ask for troubles…
Can you explain that a little bit more?
@jmjauer Is this the sequence you had in mind?
Yes, indeed
I’d say there’s one step missing in your sequence of events: After step 5, you’d have to signal the sync engine again that there’s a change to sync. It would then decide on when to do that by itself. This also means your persistence layer’s representation of a pending change would need to be rich enough to handle the case that there might be multiple pending sync operations for one single record. So not just a boolean but something kind of like a semaphore that counts up for pending changes and down for completed syncs.
I also thought about this, but I think there are no guarantees for the number of callbacks. And if the number of callbacks is less than the number of addPending calls, this will not work.
Great discussion here!
You're right that there's a potential problem if the process crashes in between a call to add(pending...)
and the StateUpdate
event. CKSyncEngine
coalesces state update events under the hood in order to prevent unnecessary extra persistence. Unfortunately, that coalescing results in this issue you're describing here. Fortunately, this delay is around 100 milliseconds, so the chance of this happening is low.
That said, this is definitely a real issue. We can track this internally and figure out a resolution, but it would hold more weight if we could get some of your opinions and perspectives as well. The more people we see who care about the issue, the more urgently we can prioritize it. Would you mind submitting a feedback about this?
To address some specific points:
There must be a way to associate the CKSyncEngine.Event.SentRecordZoneChanges callback with the changes you want to remove from your own persistence layer, but I don't see any way of doing this.
Associating the data in SentRecordZoneChanges
with your own local persistence layer is possible, but it might be tricky. There's always a chance that the user modifies some data while the previous version of that data is being sent to the server. It might be easy enough if you keep some sort of version/timestamp in your local persistence that you can compare to the saved record in SentRecordZoneChanges
. I'm not sure if there's a way CKSyncEngine
can provide something like this in a generic way, but we're always open to suggestions.
Exactly. If you fail to integrate these into your persistence layer, you’ll lose them.
Fortunately, the fetching side of this doesn't have the same issue. CKSyncEngine
won't persist the updated server change tokens in its state until you've already handled the FetchedRecordZoneChanges
event. As a result, if you crash in between fetching changes and persisting the state, you should be fine the next time you launch. CKSyncEngine
will sync from the previous change token and re-deliver all the records that weren't persisted.
- You make another change to the same record and store the updated record and the pending change in your persistence layer. ... In this case, the second change made in step 5 would be ignored.
If you're using add(pendingRecordZoneChanges:)
, then CKSyncEngine
already accounts for this possibility. Internally, it tracks the concept of an "in-flight" change. If you add a pending change that matches an in-flight change, then CKSyncEngine
will make sure it keeps that pending change even after the previous one finished. For example:
CKSyncEngine
asks you for the record, starts to save it, and marks it as in-flight.CKSyncEngine
finishes saving the record and gives you a SendRecordZoneChanges
event.Since you told the sync engine that you have a new pending save in step 3 while the previous change was in flight, it will keep that pending change even after step 4.
If you're using hasPendingUntrackedChanges
, CKSyncEngine
also has a notion of "in-flight untracked changes" and will do the same thing for the overall state of "untracked" changes. However, since you're not tracking the individual records in the state, you'd be in charge of tracking the in-flight status of these individual records.
handle the case that there might be multiple pending sync operations for one single record
CKSyncEngine
only saves one batch of changes at a time, so thankfully this wouldn't be a problem (unless I'm misunderstanding what you meant).
Hopefully that clears some stuff up. Again, this is a great discussion, and we're happy to keep it going until we get to the bottom of everything. Also, sorry for the delay on responding to this!
Hey Tim, thanks for joining the discussion!
That said, this is definitely a real issue. We can track this internally and figure out a resolution, but it would hold more weight if we could get some of your opinions and perspectives as well. The more people we see who care about the issue, the more urgently we can prioritize it. Would you mind submitting a feedback about this?
I will submit feedback on this issue. I think a simple solution would be to add async versions of add(pending...)
that return after the state is persisted.
Associating the data in SentRecordZoneChanges with your own local persistence layer is possible, but it might be tricky. There's always a chance that the user modifies some data while the previous version of that data is being sent to the server. It might be easy enough if you keep some sort of version/timestamp in your local persistence that you can compare to the saved record in SentRecordZoneChanges. I'm not sure if there's a way CKSyncEngine can provide something like this in a generic way, but we're always open to suggestions.
Maybe I'm missing something, but wouldn't it be relatively easy for CKSyncEngine
to provide some kind of change token when changes are added, and a change token in SentRecordZoneChanges
? For example, this change token could be provided by
CKSyncEngine.SendChangesContext
in nextRecordZoneChangeBatch
when CKSyncEngines
reads the records and in SentRecordZoneChanges
when the sync is done.
If you're using hasPendingUntrackedChanges, CKSyncEngine also has a notion of "in-flight untracked changes" and will do the same thing for the overall state of "untracked" changes. However, since you're not tracking the individual records in the state, you'd be in charge of tracking the in-flight status of these individual records.
This is really interesting - can you explain how CKSyncEngine tracks this in-flight status for each record when using add(pendingRecordZoneChanges:)
?
Overall, I think CKSyncEngine
is a great addition to CloudKit, but it is hard to understand the edge cases since they are not expressible via the api. Adding more examples and documentation would help a lot here. It would be great if you could add another sample project that shows how to handle sync state in your own persistence layer (using hasPendingUntrackedChanges
).
I will submit feedback on this issue. I think a simple solution would be to add async versions of
add(pending...)
that return after the state is persisted.
❤️ Thanks!
Maybe I'm missing something, but wouldn't it be relatively easy for
CKSyncEngine
to provide some kind of change token when changes are added, and a change token inSentRecordZoneChanges
? For example, this change token could be provided byCKSyncEngine.SendChangesContext
innextRecordZoneChangeBatch
whenCKSyncEngines
reads the records and inSentRecordZoneChanges
when the sync is done.
Something like that could potentially work. Or we could even let you put some sort of arbitrary object on RecordZoneChangeBatch
(similar to userInfo
on NSError
), and you could use that to track whatever you want. Would you mind submitting a feedback requesting something like this as well? We'd have to think about this a bit, but it's always worth submitting a suggestion.
This is really interesting - can you explain how CKSyncEngine tracks this in-flight status for each record when using
add(pendingRecordZoneChanges:)
?
The general flow looks like this, although the exact implementation is subject to change:
nextRecordZoneChangeBatch
, CKSyncEngine
removes the change from the "pending" list and marks it as "in-flight."CKSyncEngine
starts the network request to send the changes to the server.add(pending...)
for a change that is currently in flight, then that change will be both "in-flight" and "pending."CKSyncEngine
marks the change as no longer in-flight. The key piece here is steps 3 and 4. CKSyncEngine
keeps track of two separate states, "in-flight" and "pending." Even if it marks the change as no longer in flight in step 4, it'll still keep the pending change you added in step 3.
If you're using hasPendingUntrackedChanges
, you could simulate this behavior by marking the changes as in-flight inside your implementation of nextRecordZoneChangeBatch
, then un-marking as in-flight when handling the SentRecordZoneChangeBatch
event.
You could also achieve the same result by tracking some sort of explicit "version" each object. You'd have to know the current version locally as well as the last known version on the server. If the server version is less than the local version, then you know you still have changes to send to the server. This is less "in-flight" tracking and more just tracking whether or not you've saved everything to the server. This version could be as simple as an integer or as complex as a vector clock.
Overall, I think
CKSyncEngine
is a great addition to CloudKit, but it is hard to understand the edge cases since they are not expressible via the api. Adding more examples and documentation would help a lot here. It would be great if you could add another sample project that shows how to handle sync state in your own persistence layer (usinghasPendingUntrackedChanges
).
I agree it would be interesting to see a sample project that uses hasPendingUntrackedChanges
. For now, we've started small and simple, but perhaps one day we can build a slightly more complex sample app that uses some of the more advanced features like hasPendingUntrackedChanges
and nextFetchChangesOptions
.
❤️ Thanks!
Your welcome! The feedback ids are FB13469853 and FB13469895.
If you're using hasPendingUntrackedChanges, you could simulate this behavior by marking the changes as in-flight inside your implementation of nextRecordZoneChangeBatch, then un-marking as in-flight when handling the SentRecordZoneChangeBatch event.
I will try to implement this for my use case and get back to you with my findings and maybe suggestions for improvement :)
@timimahoney
I am currently in the process of implementing sync using hasPendingUntrackedChanges
and have encountered the following problem:
hasPendingUntrackedChanges
triggers nextRecordZoneChangeBatch(_:syncEngine:)
, but there is no nextDatabaseChangeBatch(_:syncEngine:)
. Is it possible to manually track database changes (e.g. creating a zone)?
Hi there,
I think there is an implementation error in the sample project - more specifically, a flaw in the design of CKSyncEngine. In the SyncedDatabase file, line 350:
self.syncEngine.state.add(pendingRecordZoneChanges: pendingSaves)
This tells the syncEngine what to do when the time is right. Since this piece of information is not persisted in this call (I think its persisted in .stateUpdate(let event)), there is a chance that this information is lost if the app crashes between .add(pendingRecordZoneChanges: pendingSaves) and .stateUpdate(let event).
I know this is just a sample project, but I really think this is a flaw in the API design of CKSyncEngine since I do not see an easy way to guarantee correct behavior. A simple fix would be to make .add(pendingRecordZoneChanges: pendingSaves) (and similar methods) async and only complete after .stateUpdate(let event) was called.
What do you think?