[X] PR title and description conform to Pull Request guidelines.
Issue #, if available:
Description of changes:
This is the first out of two PRs in an effort to address a current issue customers had reported in our Datastore category.
The issue is that the "changing sync expression in Runtime" doesn't work as expected as described in our doc.
For example, when we initialize the DataStorePlugin with a sync expression for Student model to only sync down all the students who are >= 20-year-old, even if we change this sync expression to sync down students who are >= 17-year-old in run-time followed by Amplify.DataStore.stop({},{}) and Amplify.DataStore.start({}, {}): the new synced expression will only be applied to the newly-created/updated students, i.e. existing instances of Students whose ages are in between 17 and 20 in remote database won't get synced, which is not as expected.
The bug is originated from the hydrate() method in SyncProcessor.java, which is being called by the startApiSync() method in Orchestrator.java, a component responsible for "Synchronizing changed data between the LocalStorageAdapter and AppSync".
To build a Sync Request (syncModel(..) method in SyncProcessor.java), we need to pass in
a model schema of the Model we want to sync
last_sync_timestamp of this model
predicate / sync expression
In order to achieve this, our implementation persists the last_sync_timestamp of models in LastSyncMetadata table whenever the code initiate a sync request, and uses the persisted last_sync_timestamp from LastSyncMetadata to initiate the next sync. (createHydrationTasks(..) in SyncProcessor.java)
This would allow us to initiate either a Delta sync or Base sync based on the lastSyncTime parameter we pass in, which is defined by the nature of AppSync's Sync API.
To further explain the cause of this bug, let's keep using the previous example:
After we initiate the first sync with sync expression (age>=20), we would add a new row in LastSyncMetadata, which might look like [Student(model name), 3(last sync timestamp)], assuming the current timestamp is 3.
After we change the sync expression in runtime (age>=17), the implementation would:
retrieve the Student model's last sync time (3) from the database
use 3 as last sync time and the updated sync expression (age>=17) to build a new sync request for Student model
update the row in LastSyncMetadata table, which might look like [Student(model name), 4(last sync timestamp)]
This will lead to the bug behavior described above, because:
AppSync will use BOTH _updated_syncexpression(age>=17) and _last_synctimestamp(3) to initiate a delta sync (_base sync will be performed only when _last_synctimestamp is 0). And for delta sync, under the hood, this last_sync_timestamp will be used to compare with the metadata _lastChangedAt in each rows of Student table in remote database.
So previous students who haven't been updated since last_sync_timestamp(3), i.e. _lastChangedAt<3, won't get synced down, even though they meet the updated sync expression, e.g. a row in Student table like [student1(name), 18(age), 2(_lastChangedAt)].
But students who are added/updated later will get synced, e.g., a row in Student table like [student2(name), 17(age), 5(_lastChangedAt)].
The essence of this problem is that: the last_sync_timestamp was only associated with the model, but should be associated with the model and the last_sync_expression being used.
To address this, we need to:
add a new column in LastSyncMetadata to store the last_sync_expression being used
store the last_sync_expression after we request a sync, and use last_sync_expression to adjust the retrieved last_sync_timestamp when we try to build a new sync request. To be specific, if the current_sync_expression != last_sync_expression, we return 0 as adjusted last_sync_timestamp to initiate a full sync.
The solution has been broke down into two steps, which would be implemented in two PRs:
DB migration for LastSyncMetadata table
logic changes to make use of the new field
How did you test these changes?
(Please add a line here how the changes were tested)
[TODO]
Documentation update required?
[x] No
[ ] Yes (Please include a PR link for the documentation update)
General Checklist
[ ] Added Unit Tests
[ ] Added Integration Tests
[x] Security oriented best practices and standards are followed (e.g. using input sanitization, principle of least privilege, etc)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue #, if available:
Description of changes: This is the first out of two PRs in an effort to address a current issue customers had reported in our Datastore category.
The issue is that the "changing sync expression in Runtime" doesn't work as expected as described in our doc.
For example, when we initialize the DataStorePlugin with a sync expression for Student model to only sync down all the students who are >= 20-year-old, even if we change this sync expression to sync down students who are >= 17-year-old in run-time followed by
Amplify.DataStore.stop({},{})
andAmplify.DataStore.start({}, {})
: the new synced expression will only be applied to the newly-created/updated students, i.e. existing instances of Students whose ages are in between 17 and 20 in remote database won't get synced, which is not as expected.The bug is originated from the
hydrate() method in SyncProcessor.java
, which is being called by thestartApiSync() method in Orchestrator.java
, a component responsible for "Synchronizing changed data between the LocalStorageAdapter and AppSync". To build a Sync Request (syncModel(..) method in SyncProcessor.java
), we need to pass inIn order to achieve this, our implementation persists the last_sync_timestamp of models in
LastSyncMetadata table
whenever the code initiate a sync request, and uses the persisted last_sync_timestamp from LastSyncMetadata to initiate the next sync. (createHydrationTasks(..) in SyncProcessor.java
)This would allow us to initiate either a Delta sync or Base sync based on the
lastSyncTime
parameter we pass in, which is defined by the nature of AppSync's Sync API.To further explain the cause of this bug, let's keep using the previous example: After we initiate the first sync with sync expression (age>=20), we would add a new row in LastSyncMetadata, which might look like
[Student(model name), 3(last sync timestamp)]
, assuming the current timestamp is 3. After we change the sync expression in runtime (age>=17), the implementation would:[Student(model name), 4(last sync timestamp)]
This will lead to the bug behavior described above, because: AppSync will use BOTH _updated_syncexpression(age>=17) and _last_synctimestamp(3) to initiate a delta sync (_base sync will be performed only when _last_synctimestamp is 0). And for delta sync, under the hood, this
last_sync_timestamp
will be used to compare with the metadata_lastChangedAt
in each rows of Student table in remote database.So previous students who haven't been updated since last_sync_timestamp(3), i.e. _lastChangedAt<3, won't get synced down, even though they meet the updated sync expression, e.g. a row in Student table like
[student1(name), 18(age), 2(_lastChangedAt)]
.But students who are added/updated later will get synced, e.g., a row in Student table like
[student2(name), 17(age), 5(_lastChangedAt)]
.The essence of this problem is that: the last_sync_timestamp was only associated with the model, but should be associated with the model and the last_sync_expression being used.
To address this, we need to:
last_sync_expression
being usedThe solution has been broke down into two steps, which would be implemented in two PRs:
How did you test these changes? (Please add a line here how the changes were tested) [TODO]
Documentation update required?
General Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.