GothenburgBitFactory / taskchampion

Personal task-tracking library
MIT License
74 stars 17 forks source link

Sync to AWS #368

Open djmitche opened 8 months ago

djmitche commented 8 months ago

Similar to the GCP sync implemented in GothenburgBitFactory/taskwarrior#3185, we should be able to sync replicas to AWS's object storage.

The tricky bit here is that, unlike GCS and Azure Blob, S3 does not provide a compare-and-swap operation.

Some reading suggests that the easiest way to accomplish this would be to use DynamoDB as a lock over the "latest" object in the S3 bucket.

dathanb commented 8 months ago

For the sake of simplicity, would it make more sense to use DynamoDB as the only store? Unless we're going to run up against the 400KB-per-item size limit for dynamo, it seems compelling to only have to configure a single cloud resource instead of two.

djmitche commented 8 months ago

I think that limit would be a problem, yes

dathanb commented 8 months ago

OK, got it. I haven't checked out the actual objects that get synced. I'll look at your PR for GCP to better understand what all gets sent.

djmitche commented 8 months ago

Great!

You can find more info on the sync protocol here. Ignore the HTTP bits for the cloud-storage case, but the rest still applies.

There's really no size limit on versions -- if a user is putting big chunks of text into their tasks, such as annotations, they might get quite large. Similarly, if they do not sync very often, an accumulation of small changes might get large. The former case doesn't really permit any technical solution -- nothing prevents a user from putting a 500KB annotation on a task, and that would need to be a single operation and thus included in a single version. Snapshots, too, have no size limit, and are proportional to the total number of tasks (of all statuses) a user has. Probably most users have relatively small task sets, but I'm sure there are people out there with 1000's or even more.

So, I think we need to take advantage of S3's more-or-less unlimited size. The alternative would be to store versions and snapshots in multiple DynamoDB items, but that seems difficult and would introduce some new failure modes.