The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response.
This PR is looking to address the following crash that customers are experiencing
DataStoreException{message=Wanted 1 metadata for item with id = e7eb4237-c0d4-4ce6-8800-3b9d5fb603b8, but had 0., cause=null, recoverySuggestion=This is likely a bug. please report to AWS.}
at com.amplifyframework.datastore.syncengine.VersionRepository.extractVersion(VersionRepository.java:91)
at com.amplifyframework.datastore.syncengine.VersionRepository.lambda$findModelVersion$0$com-amplifyframework-datastore-syncengine-VersionRepository(VersionRepository.java:64)
at com.amplifyframework.datastore.syncengine.VersionRepository$$ExternalSyntheticLambda0.accept(Unknown Source:8)
at com.amplifyframework.datastore.storage.sqlite.SQLiteStorageAdapter.lambda$query$4$com-amplifyframework-datastore-storage-sqlite-SQLiteStorageAdapter(SQLiteStorageAdapter.java:407)
at com.amplifyframework.datastore.storage.sqlite.SQLiteStorageAdapter$$ExternalSyntheticLambda6.run(Unknown Source:10)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:463)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
at java.lang.Thread.run(Thread.java:1012)
The message
Wanted 1 metadata for item with id = e7eb4237-c0d4-4ce6-8800-3b9d5fb603b8, but had 0.
The unexpected behavior observed by the customer is that DataStore is deciding to crash when publishing mutations requires a version for updates and deletes, but it could not find it.
DataStore.save(model) is called for a new model. This inserts into the local database and queues up a pending create mutation.
DataStore.save(model) is called for the existing model, performing some updates to the fields. This updates the existing record in the local database, and queues up a pending update mutation. We assume that the MutationProcessor performs (1) process the create mutation, such that it is marked in flight, and the update mutation is not merged into the create mutation when it is inserted into the mutation database, and has to be sent on its own by the MutationProcessor.
MutationProcessor (running in parallel):
Process create mutation by publishing it to the network. The network is down. The request times out and fails. MutationProcessor can handle this in two ways: retry indefinitely or drop the mutation event.
Retrying indefinitely will only occur when it accurately classifies the response as a transient network failure.
If the response is classified as a non-retryable error, the error is returned to the customer on the errorHandler and the mutation will be dropped. Subsequently there will not be any reconciliation, and no local version will be persisted.
Process update mutation. Query the local metadata table for the model’s version, cannot find it, throw exception which crashes the app.
The cause of this issue is the create mutation was dropped, which could happen for a variety of reasons decided by the MutationProcessor’s logic in handling the response of the request, and the subsequent update mutation did not have a version to send with the mutation.
This exception is thrown when the response is successful with a GraphQL response payload containing errors. If any of the GraphQLError’s do not contain the AppSync “conflictUnhandled” type then throw this.
This means the conditions for classification of non-retryable is as follows:
A successful GraphQL response containing errors
The error is not ConflictUnhandled, such as
ConditionalCheckFailedException
OperationDisabled
Unauthorized
Since we are discussing the scenario with bad network conditions, it’s unlikely that it was classified as non-retyable due to this condition since (1) A successful GraphQL response would not have occurred.
ApiException.NonRetryableException
This exception is thrown when the response is 400 to 499.
private static final int START_OF_CLIENT_ERROR_CODE = 400;
private static final int END_OF_CLIENT_ERROR_CODE = 499;
/// ...
if (response.code() >= START_OF_CLIENT_ERROR_CODE && response.code() <= END_OF_CLIENT_ERROR_CODE) {
onFailure.accept(new ApiException
.NonRetryableException("OkHttp client request failed.", "Irrecoverable error")
);
return;
}
This includes 408 Request Timeout. To be verified by manual testing, but this could be the response being handled as non-retryable even though it was a transient timeout error (bad network conditions).
In summary, there is an existing scenario to be verified: During bad network conditions, DataStore’s MutationProcessor should handle failures as retryable.
Consider the scenario in which Unauthorized is treated as non-retryable (as seen above in DataStoreException.GraphQLResponseException). Unauthorized is very common when the user is signed out, or their token or session is expired. The mutation is then dropped, and update mutation will attempt to get the version and crash the app.
This PR looks to address this scenario while another PR independent from this will look into verifying MutationProcessor's classification of transient network errors as retryable. The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response. The response will be handled by DataStore by sending it back to the customer through the errorHandler and then dropping it, moving on to the next mutation in the queue.
What kind of response are we expecting?
The conflict resolution enabled backend does not enforce version to be a required field. If we cannot get a version locally, we can send the update/delete mutation without a version, AppSync will respond successfully with a GraphQL response. This is inline with Swift DataStore's implementation, as it will also send update/delete mutations without a version if it cannot retrieve it at the time it's processing the mutation event.
In the above example, version is updated, and the name is not. The conflict resolution strategy is enabled is auto-merge. DataStore will then reconcile this response data into the local database as the latest version of this model.
Example 2: Update mutation on model not yet created in AppSync
The example above closely resembles the scenario where we continue to send the update mutation without a version. When the create mutation fails, the blog id does not exist in the AppSync. When the id is used for update mutations w/ Cognito User Pool auth, and without version, the response will be unauthorized.
According to the MutationProcessor logic, this will be categorized as a non retryable error, and the error will be returned to the customer through the errorHandler, before the update mutation is dropped. Based on their use cases, they can decide to reconcile this themselves or not, however, new devices or cleared devices will not sync this data back down to the app since it does not exist in the remote store.
How did you test these changes?
(Please add a line here how the changes were tested)
Documentation update required?
[x] No
[ ] Yes (Please include a PR link for the documentation update)
General Checklist
[x] Added Unit Tests
[ ] Added Integration Tests
[ ] Security oriented best practices and standards are followed (e.g. using input sanitization, principle of least privilege, etc)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue #, if available: https://github.com/aws-amplify/amplify-android/issues/2151#issuecomment-2135346064
Description of changes:
The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response.
This PR is looking to address the following crash that customers are experiencing
The message
comes from https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/VersionRepository.kt#L148
Code paths
The unexpected behavior observed by the customer is that DataStore is deciding to crash when publishing mutations requires a version for updates and deletes, but it could not find it.
According to this customer https://github.com/aws-amplify/amplify-android/issues/2151#issuecomment-2135346064 This happens in bad network scenarios during saving and updating a model.
The cause of this issue:
DataStore.save(model)
is called for a new model. This inserts into the local database and queues up a pending create mutation.DataStore.save(model)
is called for the existing model, performing some updates to the fields. This updates the existing record in the local database, and queues up a pending update mutation. We assume that the MutationProcessor performs (1) process the create mutation, such that it is marked in flight, and the update mutation is not merged into the create mutation when it is inserted into the mutation database, and has to be sent on its own by the MutationProcessor.MutationProcessor (running in parallel):
errorHandler
and the mutation will be dropped. Subsequently there will not be any reconciliation, and no local version will be persisted.The cause of this issue is the create mutation was dropped, which could happen for a variety of reasons decided by the MutationProcessor’s logic in handling the response of the request, and the subsequent update mutation did not have a version to send with the mutation.
MutationProcessor error classification
https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/MutationProcessor.java#L353-L355
DataStoreException.GraphQLResponseException
https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/MutationProcessor.java#L387
This exception is thrown when the response is successful with a GraphQL response payload containing errors. If any of the GraphQLError’s do not contain the AppSync “conflictUnhandled” type then throw this.
This means the conditions for classification of non-retryable is as follows:
Since we are discussing the scenario with bad network conditions, it’s unlikely that it was classified as non-retyable due to this condition since (1) A successful GraphQL response would not have occurred.
ApiException.NonRetryableException
This exception is thrown when the response is 400 to 499.This includes 408 Request Timeout. To be verified by manual testing, but this could be the response being handled as non-retryable even though it was a transient timeout error (bad network conditions).
In summary, there is an existing scenario to be verified: During bad network conditions, DataStore’s MutationProcessor should handle failures as retryable.
Consider the scenario in which Unauthorized is treated as non-retryable (as seen above in DataStoreException.GraphQLResponseException). Unauthorized is very common when the user is signed out, or their token or session is expired. The mutation is then dropped, and update mutation will attempt to get the version and crash the app.
This PR looks to address this scenario while another PR independent from this will look into verifying MutationProcessor's classification of transient network errors as retryable. The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response. The response will be handled by DataStore by sending it back to the customer through the
errorHandler
and then dropping it, moving on to the next mutation in the queue.What kind of response are we expecting?
The conflict resolution enabled backend does not enforce version to be a required field. If we cannot get a version locally, we can send the update/delete mutation without a version, AppSync will respond successfully with a GraphQL response. This is inline with Swift DataStore's implementation, as it will also send update/delete mutations without a version if it cannot retrieve it at the time it's processing the mutation event.
Example 1. Update Mutation on existing model
In the above example, version is updated, and the name is not. The conflict resolution strategy is enabled is auto-merge. DataStore will then reconcile this response data into the local database as the latest version of this model.
Example 2: Update mutation on model not yet created in AppSync
The example above closely resembles the scenario where we continue to send the update mutation without a version. When the create mutation fails, the blog id does not exist in the AppSync. When the id is used for update mutations w/ Cognito User Pool auth, and without version, the response will be unauthorized.
According to the MutationProcessor logic, this will be categorized as a non retryable error, and the error will be returned to the customer through the
errorHandler
, before the update mutation is dropped. Based on their use cases, they can decide to reconcile this themselves or not, however, new devices or cleared devices will not sync this data back down to the app since it does not exist in the remote store.How did you test these changes? (Please add a line here how the changes were tested)
Documentation update required?
General Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.