Implement sync record compaction

jsecretan commented 4 years ago

Description

Currently there are on average about 10K sync records per user which can correspond right now to new bookmarks, deletes or bookmark modifications, including re-ordering. Although the records are encrypted and it is impossible to tell, I believe that there are many more records than there are actual bookmarks, perhaps by two orders of magnitude. I believe that having this number of records can make it very difficult to add new devices to the sync chain (having to pull down thousands and thousands of records instead of just hundreds). Therefore, I believe we should compact these records where possible to reduce the number to deal with.

Because of how the records are encrypted, we are unable to compact these server side. Therefore, we need to implement sync compaction client side. Talking with @bridiver, we have a proposed algorithm:

Every 7 Days
        Scan for all records in S3 folder
        Collect all records for a given object id and their associated timestamps in S3
        For each object id
              For all in S3 records whose timestamps are earlier than the latest one
                    Delete record in S3

When enabling this, we should notice significant drops in number of records. @mrose17 has provided a sync chain that we can evaluate this against when we are ready.

mrose17 commented 4 years ago

@jsecretan - i think we may need to make this somewhat more "aggressive". the client needs to "self-heal" inconsistencies in addition to compacting...

jsecretan commented 4 years ago

We will only do this on desktop in the first version

brave / sync

Implement sync record compaction #341

Description