Closed testflyjets closed 3 years ago
Since the Lambda resolvers are invoked by AppSync, I'm not sure what is implicating the Amplify iOS library code here. Are you able to turn on CloudWatch logging for your AppSync API to confirm that your custom resolver is in fact being invoked?
Whether the custom resolver runs or not doesn't seem to be the issue here. The issue is that the behavior of AppSync in terms of how changes made offline get synched to the cloud is different when we run the app in the iOS simulator versus when we run it on the device.
What method do you suggest to simulate loss of network connectivity in the simulator that would be similar to turning off WiFi and cellular data on a device?
I was not able to replicate the issue in a sample app. Here are the steps followed:
Todo
modelLogic used to update multiple times
queryAndUpdate()
func queryAndUpdate(_ num: Int = 0) {
Amplify.DataStore.query(Todo.self, where: Todo.keys.name == "Todo 3") { result in
switch result {
case .success(let todos):
print("Todo fetched successfully! \(todos)")
var todo = todos[0]
todo.description = "Description \(num)"
update(todo: todo)
case .failure(let error):
print("Error fetching todo \(error)")
}
}
}
func update(todo: Todo) {
Amplify.DataStore.save(todo) { result in
switch result {
case .success:
print("Todo saved successfully!")
self.num = self.num + 1
if (self.num < 30) {
self.queryAndUpdate(self.num)
}
case .failure(let error):
print("Error saving todo \(error)")
}
}
}
Few questions to help us debug further:
Hi, @royjit, thanks for doing these tests. If you are able to share this ToDo app with me I will run it locally and see if I can replicate your results.
As for seeing errors in Xcode, I don't see any exceptions per se. However, I do see some odd behavioral differences between running the app on the device and in the iOS simulator. I added logging of many of the DataStore events to try to understand precisely what is happening in the app around the synchronization of changes to AWS.
For example, in the simulator, when I turn WiFi off on my laptop, the DataStore networkStatus
event indicates the loss of connectivity. However, when the network connection is restored the networkStatus
event indicating restoration of connectivity is not fired. On the device the networkStatus
event is generated and logged immediately on loss and regain of network connectivity. My concern is that Amplify behaves differently if it thinks the device is connected to the network, and the "correct" behavior we're seeing in the simulator is actually a side-effect from a bug!
Also, when connectivity is restored in the simulator, the outboxMutationProcessed
event fires for the first update, but the outboxStatus
event shows the outbox is NOT empty, presumably because the subsequent changes need to be processed. I then see an additional outboxMutationEnqueued
event fire that contains the additional changes that I made.
On the device, after the outboxMutationProcessed
event the outboxStatus
shows the DataStore outbox is empty. Therefore, the additional outboxMutationEnqueued
event never fires, and if the additional changes are never queued for synch back to Dynamo, the data doesn't get updated. Again, this is my interpretation of what I think I'm seeing.
The Lambda resolver only executes when the second set of changes are sent to AWS via the iOS simulator, and there are no errors present. Since the app running on the device doesn't trigger a "second" update there are no conflicts to resolve (my assumption here).
I did update to amplify-ios 1.12.0 to see if this problem went away. Sadly, it did not. I saw the same datastore changes and was hopeful but alas, no such luck.
Please let me know if there's anything else I can do to help you move forward.
I've been able to reproduce this locally and I think I have an idea what's going on. Will update as soon as I have more information.
Awesome! Great news, and we look forward to your next update. I'm in the process of detailing the sequence of events and log messages I see in the two cases (iOS sim vs. device). I was going to attach that to the open AWS Support case, and am happy to provide it here as well. Let me know if that would be helpful.
I'll see it either way, so whichever is most convenient for you. :)
I'm still investigating this, but here are my findings so far:
I have a test that reliably reproduces this: https://github.com/aws-amplify/amplify-ios/blob/palpatim.fix.data-store-lost-mutation-1324/AmplifyPlugins/DataStore/AWSDataStoreCategoryPluginTests/Sync/MutationQueue/OutgoingMutationQueueNetworkTests.swift
The bug manifests when:
Discussion
The Sync Engine has a cleanup routine it runs during initial startup that sets locally persisted MutationEvents to have an inProcess
status of false
. This is to recover from cases where the app was terminated while the mutation queue was processing an event. Upon restart, the OutgoingMutationQueue picks up the first not-in-process item in the queue and sends it to the backend.
This is all working as expected and designed.
However, we use the same cleanup routine in no-network situations, and there is a problem with some of our assumptions about the system state at that point. The behavior I'm seeing is:
inProcess=true
inProcess=false
What's not yet clear is the fix. It could be that we shouldn't clean up in-process events for no-network situations; or that we should change our disposition logic to simply enqueue all mutation events regardless of whether there is any previous event; or some other fix. We are discussing the right way forward and will update when we have more info.
Impressive -- thanks for the deep dive, and for the test you wrote for this. Your detailed observations track precisely with the behavior we have observed. We are standing by for a fix or a workaround.
We believe the right thing to do here is to cancel the in-process, in-memory operation when the subscription receives a network error. We're working on implementing that in a safe way. Updates as we get them.
Awesome, thanks!
We attempted to remedy this with a targeted fix, but it doesn't cover edge cases the way we want. We're now trying to introduce a new "pausing" state that is distinct from the previous one, that safely removes in-process sync operations from the OutgoingMutationQueue. That's a larger change, and we're still in process on it.
We have something working locally, which you can view on our working branch if you'd like to preview it. Unit tests pass, and we're running through integration tests now to ensure everything works as expected.
@palpatim I am trying to use the Swift Package Manager to pull in your hotfix branch. It seems to pull it in correctly when I point it at your branch but I'm seeing an error when I build the code with that version of Amplify. It's complaining about something that appears related logging in the Cognito Auth plugin. Is this perhaps some leftover debug logging?
Also, the package manager doesn't give me the option to select the plugins per the Amplify documentation here. I'm relatively new to using SPM for managing dependencies -- is there something else I need to do here to try this fix?
I commented out the two lines that Xcode was complaining about and the code runs now, FYI.
Thanks--the latest push to that branch fixes that problem (I made the change in a different target than was active, so didn't see the compile error).
No worries. Good news! The fix definitely works on the device based on our testing just now.
Quick question: should Amplify Storage work the same way, in that a file upload requested while offline will resume once connectivity is reestablished?
We are calling Storage like this:
let storageOperation = Amplify.Storage.uploadData(key: key, data: imageData, options: options)
We aren't seeing the upload retried once the app is reconnected currently, but it's not clear from the documentation if it should work that way. We are hoping it does, and I figure you'd know.
I think that Storage is supposed to allow you to queue an upload while offline, and then fulfill it when network resumes, but I'm not 100% sure on that. @brennanMKE, can you clarify expected behavior for Amplify.Storage?
Regarding the original DataStore issue: the team is reviewing the PR. Assuming all looks good, the fix for this will go out in a patch release.
@testflyjets
You can skip to the recommendation in the last section if you like. This first section offers some context for the AWS SDK and Amplify SDK.
In the underlying implementation you can see it allows for passing in values for allowsCellularAccess and timeoutIntervalForResource. This configuration is used for all network activity run through Transfer Utility. You can see the default for allowsCellularAccess
is set in AWSNetworking to allow by default. Without it all of these network tasks would not be allowed unless WiFi is connected. It could be set in amplifyconfiguration.json
or awsconfiguration.json
.
From the Amplify Storage side you can see the upload options do not include these properties so it is not possible to specify unique behavior per request.
There is a property called waitsForConnectivity which will either fail immediately if there is no access or queue the upload and let it start when a viable network path is available. All requests run by Transfer Utility are created with a background session and so the last line is relevant here.
This property is ignored by background sessions, which always wait for connectivity.
You can take control of this behavior by directly starting or canceling network operations based on connectivity status.
If your Deployment Target is iOS 12.0 or later you could use Apple's Network framework to monitor the network path and start or cancel operations in response to changes. If you need to support earlier versions you could use a package that I prepared called Connectivity which falls back to using the Reachability library. It does not work as well as Apple's modern framework designed for mobile scenarios which were not common when Reachability was first created, but your app metrics likely show a small percentage of your users are using a version of iOS before 12. It will be sufficient for those users while giving the majority of your users the optimal behavior.
That package includes a demo app that you can use to observe changes in your device's network path. You can put the app on an iPhone, disable the cellular radio and walk out of range of your WiFi network to see it change, or just turn on Airplane Mode. One feature you'll likely want to use to benefit your users is isExpensive. It is designed to be intelligent about the current network path. If you are on regular WiFi the value will be false. If you are a cellular connection, which likely involves data rates, it will be true. And if you happen to be using a WiFi hotspot which connects via cellular the value will also be true. This is one feature of the Network framework which is missing from Reachability. This package sets this property based on the network status alone with the Reachability implementation. You can avoid large uploads when your users are not on WiFi and save them both battery life and potentially higher data rates.
When the network path changes from .unsatisfied
to .satisfied
your network operations can run while if the opposite happens your can choose to cancel those network activities and start them again when a viable path is available. In my experience these interruptions are often momentary, like going through a tunnel or riding in an elevator. And when I get off the elevator it also means I will switch from cellular to WiFi or vice versa. You can make your code behavior exactly how you choose with this framework.
@brennanMKE Thanks for the context, it's helpful. So as not to pollute the original DataStore issue with this Storage issue, I'll open a new issue here so we can continue this conversation separately. We are not seeing Storage retry a failed upload when connectivity is restored, and if I'm understanding what you wrote above, it should.
We've merged the fix for this issue into main
, but are still working through some sporadic test failures and fixing a data race that we found as well. If you'd like to test the main
branch, it should have the outgoing mutation queue fix applied, but not the data race fix.
We fixed the issues we found and things seem solid now. We'll get this released to prod as soon as possible.
We released this to prod in 1.13.1. I'm optimistically marking this as "closing soon", but please let us know if you have any other issues!
Great, thanks, we'll update to that production release. We very much appreciate all the work you and your team did on this issue.
On Fri, Aug 6, 2021 at 8:11 AM Tim Schmelter @.***> wrote:
We released this to prod in 1.13.1 https://github.com/aws-amplify/amplify-ios/releases/tag/v1.13.1. Please let us know if you have any other issues.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aws-amplify/amplify-ios/issues/1324#issuecomment-894327728, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABHGKHJMSOMIR6EI2SHMITT3P3SBANCNFSM5AP5ZA3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
@testflyjets I'll optimistically close this, but please feel free to reopen if v1.13.1 didn't fully solve your issue
Describe the bug
We were using AutoMerge as our conflict resolution strategy. When the app was offline, if multiple different changes were made to a model and saved to the datastore, it appeared that only the first change was actually synched back to Dynamo once the connection was reestablished. The behavior seemed to indicated that the first change was accepted by the resolver, which caused the version number to increment and the current data (what was in Dynamo + the one new change) to be synched back to the device, overwriting all the other changes that were made offline.
We switched to using a simple custom Lambda for conflict resolution, one that essentially just RESOLVES all changes:
With this strategy in place, running the app in the iOS simulator it works exactly as we'd hoped -- all of the changes that were made while offline all get synched to Dynamo.
However, running the exact same code on an actual device, we are seeing the original behavior, in which only the first change that is made while offline is synched to Dynamo when connectivity is restored.
We also noted what seemed to be odd behavior depending on how "offline" was established on the device. Using Airplane Mode instead of turning Cellular Data off directly seemed to cause the app to show somewhat random synching behavior while just turning off Wifi and Cell Data directly resulted in repeatable behavior.
Steps To Reproduce
Expected behavior
We expect the application to behave the same in both the iOS simulator and the device.
Amplify Framework Version
1.9.1
Amplify Categories
DataStore
Dependency manager
Swift PM
Swift version
5.3.2
CLI version
4.47.1
Xcode version
12.4 (12D4e)
Relevant log output
Is this a regression? (i.e. was this working before a version upgrade)
No.
Device
iPhone 12 simulator and iPhone 12 device
iOS Version
14.6
Specific to simulators
No response
Additional context
No response