Open pauldambra opened 2 years ago
In this comment https://github.com/PostHog/posthog/issues/7015#issuecomment-965042257
@guidoiaquinti says
Where are we going to store the CSV generated file so that we can fetch it async? ClickHouse, PostgreSQL or Redis are not valid answers
So, does the app service (which runs Celery?) have shared storage?
Otherwise do we restrict the feature to only run if something like S3 is accessible
The optimal solution I think should be leveraging the Django Storage class, then depending on the config people can use AWS S3, local storage, etc...
The logic could be something like:
Storage
(class), we return an URL to download itThis issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale
label – otherwise this will be closed in two weeks.
Is your feature request related to a problem?
We receive frequent reports that users cannot download the events CSV or cannot download enough events when it does work
8280
8052
5915
5959
3503
7130
8164
8051
Describe the solution you'd like
When a user clicks to export a CSV we start a background task to generate the file. If we can start the task we return success to the user and present the URL where the file will be available.
The page can then poll until the file is available. We could also mail the user the link and/or make their download URLs available in the UI
We MUST have a mechanism for deleting files after a TTL so we don't exhaust disk space We SHOULD have a mechanism for storing files in S3 on cloud
Describe alternatives you've considered
Additional context
On cloud ClickHouse refuses to generate more than 3 or 4 thousand events at a time when synchronously generating
Thank you for your feature request – we love each and every one!