Closed timgl closed 3 years ago
Thinking out loud on how to make this.
I assume at the end of the day we'd be happy with a long YYYMMDD-HHMM-DIGEST.csvjson
file, that contains a bunch of collected events? One event json per line { event: '', properties: {} }
.
To make that happen, we need to either:
1) Have a periodic task (runEveryMinute
), that calls the PostHog Events API, to get all events from $lastrun
(could be a huge number?) and then creates one or more .csvjson
or .txt
files that it uploads to S3. This script could be set to run from $first_event
to back up all ingested events to S3.
2) Alternatively, every time an event is run through the S3 plugin, store it in a list in Redis. Then have a runEveryMinute
task that reads the entire list from redis and sends it to S3.
Which seems like a better approach?
Also, this probably requires us to add and export a few extra vendor libraries to talk to S3 via the AWS SDK. Segment exposes the entire aws-sdk
package just as AWS
.
And it's done! 🎉
I'll add it to the repo once 1.24 is out.
I want to dump all my persons and events into an s3 bucket for analysis/backup