Our data collection tasks will not be initiated by the user. These tasks are either persistent (example: continuously consuming a Twitter firehose), or periodic (example: making resuests to Fractal API to retrieve recent events, requesting an updated version of a website RSS feed).
A web application is not a good fit for these tasks. The usual elements of such a system are:
1. Task queue
Task queue is the application that runs in the background and listens for task to perform. If any tasks are come in, it sends them to workers using a message broker. Workers perform tasks and store results using a result store.
There are multiple Python implementations to choose from.
Choices:
Celery
Dramatiq
~djando-carrot~ not updated since 2019, seems abandoned
RQ, ...
There's more, we need to pick one.
If we go with celery, make sure to look at django-celery-beat, the admin UI:
Our data collection tasks will not be initiated by the user. These tasks are either persistent (example: continuously consuming a Twitter firehose), or periodic (example: making resuests to Fractal API to retrieve recent events, requesting an updated version of a website RSS feed).
A web application is not a good fit for these tasks. The usual elements of such a system are:
1. Task queue
Task queue is the application that runs in the background and listens for task to perform. If any tasks are come in, it sends them to workers using a message broker. Workers perform tasks and store results using a result store.
There are multiple Python implementations to choose from.
Choices:
There's more, we need to pick one.
If we go with celery, make sure to look at
django-celery-beat
, the admin UI:https://docs.celeryq.dev/en/stable/userguide/periodic-tasks.html#using-custom-scheduler-classes
2. Message broker
Choices: Redis, RabbitMQ,, ...
3. Result store
With Celery, this can be Django ORM, Redis, or other things.