TheExGenesis / community-archive

An open tweet database and API anyone can build on.
https://www.community-archive.org
MIT License
55 stars 8 forks source link

decouple db insertion from client using edge functions #169

Open TheExGenesis opened 2 weeks ago

TheExGenesis commented 2 weeks ago

Current state: currently, we're inserting archives into storage (which is fast), and then from the client, inserting data into the db in batches (which is slow)

Solution: I think we can simply listen to entries into the storage.objects table, and run an edge function that inserts the archive in batches. As is currently done from the client, but just from the background.

This is tangent, but it would be good to give the user a browser notification when their archive is done uploading, by listening to archive_upload.upload_phase=complete.

Potential limitations: Edge functions have a 400s wall clock time limit, which should be enough for any archive.

They have a 256MB memory limit, which means that for really big archives they may not work. The solution here would be to break up archives into separate files (maybe even just uploading them as they exist in the user archive, split into tweets, like, followers, etc

ri72miieop commented 2 weeks ago

Thoughts on using a https://trigger.dev instance (self-hosted or not)? We would get better observability and fewer limitations to run whatever code we want. It would add a new dependency though, which we may want to avoid...

TheExGenesis commented 2 weeks ago

yeah I'm trying to avoid adding services, breaking archive into files isn't that big of a deal really