PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.73k stars 1.24k forks source link

Event CSV Export needs to be more stable #8373

Open pauldambra opened 2 years ago

pauldambra commented 2 years ago

Is your feature request related to a problem?

We receive frequent reports that users cannot download the events CSV or cannot download enough events when it does work

Describe the solution you'd like

When a user clicks to export a CSV we start a background task to generate the file. If we can start the task we return success to the user and present the URL where the file will be available.

The page can then poll until the file is available. We could also mail the user the link and/or make their download URLs available in the UI

We MUST have a mechanism for deleting files after a TTL so we don't exhaust disk space We SHOULD have a mechanism for storing files in S3 on cloud

Describe alternatives you've considered

  1. Removing the feature - team discussion ruled this out
  2. Not fixing it and continuing to provide ad-hoc workarounds to users

Additional context

On cloud ClickHouse refuses to generate more than 3 or 4 thousand events at a time when synchronously generating

Thank you for your feature request – we love each and every one!

pauldambra commented 2 years ago

Some references

pauldambra commented 2 years ago

In this comment https://github.com/PostHog/posthog/issues/7015#issuecomment-965042257

@guidoiaquinti says

Where are we going to store the CSV generated file so that we can fetch it async? ClickHouse, PostgreSQL or Redis are not valid answers

So, does the app service (which runs Celery?) have shared storage?

Otherwise do we restrict the feature to only run if something like S3 is accessible

guidoiaquinti commented 2 years ago

The optimal solution I think should be leveraging the Django Storage class, then depending on the config people can use AWS S3, local storage, etc...

The logic could be something like:

posthog-bot commented 6 months ago

This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.