Koed00 / django-q

A multiprocessing distributed task queue for Django
https://django-q.readthedocs.org
MIT License
1.83k stars 290 forks source link

Get ID of task from within the task and query the status of a long running task #529

Open BoPeng opened 3 years ago

BoPeng commented 3 years ago

In the process of evaluating the possibility of concerting a celery-based app to django-q, I noticed that celery has

@celery_app.task(bind=True)
def my_task(self):
    self.request.id.  # task id, which could be the one I assigned so know what task this is
    self.update_state(state='PROGRESS', 'Processing the third step')

The later allows me to view the status of the task with

res = AsyncResult(task_id)
res.state
res.info

while the task is running.

Are there django-q equivalence of these features?

Koed00 commented 3 years ago

Here are some examples from the docs:


from django_q.tasks import async_task, result

# create the task
async_task('math.copysign', 2, -2)

# or with import and storing the id
import math.copysign

task_id = async_task(copysign, 2, -2)

# get the result
task_result = result(task_id)

# result returns None if the task has not been executed yet
# you can wait for it
task_result = result(task_id, 200)

# but in most cases you will want to use a hook:

async_task('math.modf', 2.5, hook='hooks.print_result')

# hooks.py
def print_result(task):
    print(task.result)

There is not a state as such. You get either None or a result or a Task returned (with fetch), which will be of subclass Success or Failure. https://django-q.readthedocs.io/en/latest/tasks.html?highlight=result#django_q.result

BoPeng commented 3 years ago

Thanks for the quick response. i did read the documentation. So in summary,

  1. For problem 1, the task function cannot know the id of the task. I have a tendency of assigning task_id as the uuid of the object it works and retrieve it when the task starts. I suppose I can pass the uuid as an argument to the task, no big deal here.

  2. For problem 2, the task function cannot access the task instance (and set status etc). Communication is therefore one-way from django-q to task. I guess I will need to find some other way to expose the status of running tasks.

Another slight inconvenience is that in celery tasks have their names independent of their locations, which makes moving tasks between modules a bit easier. In django-q tasks are referred to by module.function so I will need to change task names when tasks are moved. Am I correct here?

Koed00 commented 3 years ago

I think nr. 2 comes mostly from you being used to Celery. Most brokers just have a task in queue, hand it off to a worker and then either it times out or they get an out of process/async ack/nack. The AMPQ protocol is unique in that it keeps a connection with the consumer/worker at all times during the time it's working on a task. This probably stems from the time the AMPQ protocol was used for banking transactions over dial-up connections. So that when the connection drops - the task failed. This is great if you want real-time feedback from the running tasks, but it doesn't scale well.

The workaround for me has been to split of complex tasks in several smaller chunks and then use the chain class. It has current and length methods that enable you to track the progress of steps in your multi-step task.

BoPeng commented 3 years ago

ok, I understand that I cannot get status of tasks during their execution. I have another question, how do I get the exception raised by the task (and caused the task to fail)? Namely, how do I complete the following status query code?

task = Task.get_task(task_id)
if task.success:
   # great, ok, but am I guaranteed that  result is available?
   return task.result
elif task.stopped:
   # task stopped but not succeed, but perhaps the task has just stopped with success not yet been set?
   return ???  # where is the exception raised from the task?
else:
    # pending
kbruegge commented 3 years ago

I'm dealing with the same issue here. I was wondering how a running task might know its own ID.

1. For problem 1, the task function cannot know the id of the task. I have a tendency of assigning `task_id` as the uuid of the object it works and retrieve it when the task starts. I suppose I can pass the `uuid` as an argument to the task, no big deal here.

I don't quite understand how you are doing it. I thought about passing an extra argument to the task but it seems like the task ID only exists after it has been created. Can you give me any pointers?

shriDeveloper commented 3 years ago

I am too stuck with this problem. My concern is how do I get the status of the task. Task can be Either of mentioned stages (Failed,Success....etc). I only have the task_id .

BoPeng commented 3 years ago

I don't quite understand how you are doing it. I thought about passing an extra argument to the task but it seems like the task ID only exists after it has been created. Can you give me any pointers?

Sorry for the confusion. For celery I used to assign task_id as the pk and type of the object that the task will be working on. After the task is started, the task grabs the ID of the task and retrieve the object with it. The advantage is that I can know immediately what the task is about and if needed, drilled into the database with the pk. With django-q I simply pass these information as parameters.

BoPeng commented 3 years ago

I am too stuck with this problem. My concern is how do I get the status of the task. Task can be Either of mentioned stages (Failed,Success....etc). I only have the task_id .

The code I copied above would more or less work but I am not quite sure if it is reliable and is the recommended way for telling the status of the task, having something like task.status would be more straightforward.

shriDeveloper commented 3 years ago

I am too stuck with this problem. My concern is how do I get the status of the task. Task can be Either of mentioned stages (Failed,Success....etc). I only have the task_id .

The code I copied above would more or less work but I am not quite sure if it is reliable and is the recommended way for telling the status of the task, having something like task.status would be more straightforward.

There is't anything like task.status