aws-amplify / amplify-android

The fastest and easiest way to use AWS from your Android app.
https://docs.amplify.aws/lib/q/platform/android/
Apache License 2.0
249 stars 117 forks source link

fix(datastore): Add syncExpression field to LastSyncMetadata #2936

Open edisooon opened 1 month ago

edisooon commented 1 month ago

Issue #, if available:

Description of changes: This is the first out of two PRs in an effort to address a current issue customers had reported in our Datastore category.

The issue is that the "changing sync expression in Runtime" doesn't work as expected as described in our doc.

For example, when we initialize the DataStorePlugin with a sync expression for Student model to only sync down all the students who are >= 20-year-old, even if we change this sync expression to sync down students who are >= 17-year-old in run-time followed by Amplify.DataStore.stop({},{}) and Amplify.DataStore.start({}, {}): the new synced expression will only be applied to the newly-created/updated students, i.e. existing instances of Students whose ages are in between 17 and 20 in remote database won't get synced, which is not as expected.

The bug is originated from the hydrate() method in SyncProcessor.java, which is being called by the startApiSync() method in Orchestrator.java, a component responsible for "Synchronizing changed data between the LocalStorageAdapter and AppSync". To build a Sync Request (syncModel(..) method in SyncProcessor.java), we need to pass in

In order to achieve this, our implementation persists the last_sync_timestamp of models in LastSyncMetadata table whenever the code initiate a sync request, and uses the persisted last_sync_timestamp from LastSyncMetadata to initiate the next sync. (createHydrationTasks(..) in SyncProcessor.java)

This would allow us to initiate either a Delta sync or Base sync based on the lastSyncTime parameter we pass in, which is defined by the nature of AppSync's Sync API.

To further explain the cause of this bug, let's keep using the previous example: After we initiate the first sync with sync expression (age>=20), we would add a new row in LastSyncMetadata, which might look like [Student(model name), 3(last sync timestamp)], assuming the current timestamp is 3. After we change the sync expression in runtime (age>=17), the implementation would:

This will lead to the bug behavior described above, because: AppSync will use BOTH _updated_syncexpression(age>=17) and _last_synctimestamp(3) to initiate a delta sync (_base sync will be performed only when _last_synctimestamp is 0). And for delta sync, under the hood, this last_sync_timestamp will be used to compare with the metadata _lastChangedAt in each rows of Student table in remote database.

So previous students who haven't been updated since last_sync_timestamp(3), i.e. _lastChangedAt<3, won't get synced down, even though they meet the updated sync expression, e.g. a row in Student table like [student1(name), 18(age), 2(_lastChangedAt)].

But students who are added/updated later will get synced, e.g., a row in Student table like [student2(name), 17(age), 5(_lastChangedAt)].

The essence of this problem is that: the last_sync_timestamp was only associated with the model, but should be associated with the model and the last_sync_expression being used.

To address this, we need to:

The solution has been broke down into two steps, which would be implemented in two PRs:

  1. DB migration for LastSyncMetadata table
  2. logic changes to make use of the new field

How did you test these changes? (Please add a line here how the changes were tested) [TODO]

Documentation update required?

General Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.