Currently there are on average about 10K sync records per user which can correspond right now to new bookmarks, deletes or bookmark modifications, including re-ordering. Although the records are encrypted and it is impossible to tell, I believe that there are many more records than there are actual bookmarks, perhaps by two orders of magnitude. I believe that having this number of records can make it very difficult to add new devices to the sync chain (having to pull down thousands and thousands of records instead of just hundreds). Therefore, I believe we should compact these records where possible to reduce the number to deal with.
Because of how the records are encrypted, we are unable to compact these server side. Therefore, we need to implement sync compaction client side. Talking with @bridiver, we have a proposed algorithm:
Every 7 Days
Scan for all records in S3 folder
Collect all records for a given object id and their associated timestamps in S3
For each object id
For all in S3 records whose timestamps are earlier than the latest one
Delete record in S3
When enabling this, we should notice significant drops in number of records. @mrose17 has provided a sync chain that we can evaluate this against when we are ready.
Description
Currently there are on average about 10K sync records per user which can correspond right now to new bookmarks, deletes or bookmark modifications, including re-ordering. Although the records are encrypted and it is impossible to tell, I believe that there are many more records than there are actual bookmarks, perhaps by two orders of magnitude. I believe that having this number of records can make it very difficult to add new devices to the sync chain (having to pull down thousands and thousands of records instead of just hundreds). Therefore, I believe we should compact these records where possible to reduce the number to deal with.
Because of how the records are encrypted, we are unable to compact these server side. Therefore, we need to implement sync compaction client side. Talking with @bridiver, we have a proposed algorithm:
When enabling this, we should notice significant drops in number of records. @mrose17 has provided a sync chain that we can evaluate this against when we are ready.