aecreations / sync-clippings

Sync Clippings Helper
Mozilla Public License 2.0
1 stars 1 forks source link

Compress outgoing sync data in ZIP format #18

Open aecreations opened 8 months ago

aecreations commented 8 months ago

A forum poster suggested compressing the synced clippings data into ZIP format to bypass the 1 MiB limit imposed by the native messaging API[1] to allow more data to be synced. Forum post: https://groups.io/g/aecreations-help/message/48

This only needs to be done for sync data being sent from Sync Clippings Helper to the extension. Incoming data to the native app has a more generous limit of 4 GiB.

-- [1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Native_messaging

aecreations commented 8 months ago

There is a possibility that users on an older version of Clippings (which can't handle sync data in ZIP format) may update to the new version of Sync Clippings Helper with data compression.

To allow for backward compatibility, the compatible version of Clippings should send a new native message get-compressed-synced-clippings to Sync Clippings Helper. Older versions can continue to send the get-synced-clippings message, which should cause Sync Clippings Helper to respond with the sync data in the normal, uncompressed format.

aecreations commented 8 months ago

Compressing data into gzip format using Python:

import gzip
s = 'Hello world!'
b = s.encode('UTF-8')  # Convert Unicode string to bytes
z = gzip.compress(b)
print(z)  # Output: b'\x1f\x8b\x08\x00\xfc\xb3\x93e\x02\xff\xf3H\xcd\xc9\xc9W(\xcf/\xcaIQ\x04\x00\x95\x19\x85\x1b\x0c\x00\x00\x00'

Source: https://docs.python.org/3/library/gzip.html

aecreations commented 5 months ago

The message data from the native app needs to be sent to the extension in JSON format, so the compressed data needs to be wrapped in a JSON object.

Also, bytes cannot be stored in a JSON object, so the zipped data needs to be base64 encoded. This reduces the effectiveness of data compression.

Some stats using the attached test sync file as an example:

Size of the sync data, encoded in UTF-8 (bytes): 9538 Size of zipped data (bytes): 3427 Size of base64-encoded string containing the zipped data (chars): 4572

If we consider 1 character to be equal to 1 byte in the base64-encoded string, then compressing the Sync Clippings data results in a reduction of 52% in the sync data size.

clippings-sync.json