PostHog / plugin-repository

Plugins for PostHog
MIT License
2 stars 7 forks source link

Plugin request: S3 #12

Closed timgl closed 3 years ago

timgl commented 3 years ago

I want to dump all my persons and events into an s3 bucket for analysis/backup

mariusandra commented 3 years ago

Thinking out loud on how to make this.

I assume at the end of the day we'd be happy with a long YYYMMDD-HHMM-DIGEST.csvjson file, that contains a bunch of collected events? One event json per line { event: '', properties: {} }.

To make that happen, we need to either:

1) Have a periodic task (runEveryMinute), that calls the PostHog Events API, to get all events from $lastrun (could be a huge number?) and then creates one or more .csvjson or .txt files that it uploads to S3. This script could be set to run from $first_event to back up all ingested events to S3.

2) Alternatively, every time an event is run through the S3 plugin, store it in a list in Redis. Then have a runEveryMinute task that reads the entire list from redis and sends it to S3.

Which seems like a better approach?

mariusandra commented 3 years ago

Also, this probably requires us to add and export a few extra vendor libraries to talk to S3 via the AWS SDK. Segment exposes the entire aws-sdk package just as AWS.

mariusandra commented 3 years ago

And it's done! 🎉

https://github.com/PostHog/s3-export-plugin

mariusandra commented 3 years ago

I'll add it to the repo once 1.24 is out.