Closed dwreeves closed 2 years ago
Scoped it out. The website is very much not ready for Celery integration for a lot of reasons.
It would make the code base much nicer to have Celery in it, the more I look into it. In particular, signatures and chains can really help, for example right now the only way I have to download data from the admin panel directly is to just copy+paste the code multiple times and draw out the dags partially. But with a shared signature within a group of tasks, I could pass say a to_db: bool
arg that, when True, writes each individual task to the db and when False it does not.
The issue is integrating Celery is a bit tricky. Here's how far I am. https://github.com/dwreeves/flagging/tree/786334a4de31b7f36ba91f6fe7403a1466e6ca58 which is to say, not far at all. I'd have to basically rewrite the whole backend to support Celery.
A much simpler solution I think is to just integrate the commands into the Heroku Scheduler, and dump the data somewhere once a week (Google Drive? GCS?).
Current status--
https://github.com/dwreeves/flagging/actions/runs/717102265
One of the biggest problems with integrating Celery right now is that Redis (result store) doesn't store pandas dataframe objects, which is how we passed stuff around database functions before... 😬 I'm a bit new to Celery. I'll look into this as I go along. I think what I want is just a good and idiomatic way to add postprocessors to tasks.