czue / celery-progress

Drop in, configurable, dependency-free progress bars for your Django/Celery applications.
MIT License
470 stars 90 forks source link

Race between creating ProgressRecorder and setting its progress #80

Open im4bacon opened 3 years ago

im4bacon commented 3 years ago

If a ProgressRecorder is created, its state is not initialized until set_progress() is called. During this time, I think there is a race condition where a page calling initProgressBar() can get a bad state === null and fall into onTaskError() during this time. (progressBarMessageElement will say "Uh-Oh, something went wrong! Unknown state")

czue commented 3 years ago

Thanks for this issue! I'll try and set out some time to look into it in the next few weeks.

OmarWKH commented 3 years ago

I'm not sure it's a race condition. The initial state should be "PENDING" or "STARTED" even if set_progress() isn't called.

The error message indicates this is caused by an unknown state (in which case this is related to #54). It would be helpful to know what state caused the error. Do you have a way to check the task state when the error happens? If not, I made PR #82 to log unknown states and show them in the "Uh-Oh" message.

im4bacon commented 3 years ago

What I saw was that if I created a ProgressRecorder and didn't set_progress until later, I could see that state was occasionally coming back as null and then within a few seconds I'd see it going to PENDING state.. But by that time, I already had the "uh-oh" message.

As soon as I added a set_progress just below my creation of the ProgressRecorder, I haven't had the problem since. progress_recorder = ProgressRecorder(self) progress_recorder.set_progress(0, 1, description='Querying solutions...')

OmarWKH commented 3 years ago

That's odd. I'm unable to reproduce the issue with this setup: python 3.8, redis 5.0, packages:

The task:

@shared_task(bind=True)
def http_task(self, number):
    progress_recorder = ProgressRecorder(self)
    for i in range(number):
        time.sleep(.1)
    return random()*1000

The view:

def http_view(request):
    result = http_task.apply_async(kwargs={'number':15})
    return render(request, 'http.html', context={'task_id': result.task_id})

Is your application similar? Any other notable thing that could be related (celery config, progress bar template)? Also, where did you see that the state was null?

im4bacon commented 3 years ago

This doesn't happen often, and I just can't figure out how to reproduce it. But I added some console logs to hopefully give a little more info for you.

https://github.com/czue/celery-progress/blob/4b1e023cbd72083e1eb542560c49d5653ceb5ee2/celery_progress/static/celery_progress/celery_progress.js#L114

In celery_progress.js, in the function onData(), I'm falling into the case where data.complete === true and data.success === false. data.state does not equal RETRY, so I'm hitting the else onTaskError(). Even though the task did complete successfully.

{state: "PROGRESS", complete: true, success: false, progress: {…}, result: "Unknown state { <> }" state: "PROGRESS" success: false

data.progress looks like {pending: false, current: 0, total: 100, percent: 0}

OmarWKH commented 3 years ago

{state: "PROGRESS", complete: true, success: false, progress: {…}, result: "Unknown state { <> }"

Interesting. I'm not sure how this response was made, but I have a theory.

This response is produced when a state is unknown. See here.

But 'PROGRESS' is a known state, and there's an "else if" clause that produces a proper response for it: https://github.com/czue/celery-progress/blob/4b1e023cbd72083e1eb542560c49d5653ceb5ee2/celery_progress/backend.py#L94

The only thing I can think of is that the state (self.result.state) changed midway. It was 'PROGRESS' at first, then it changed to an unknown state. So self.result.state had different values in these two lines: https://github.com/czue/celery-progress/blob/4b1e023cbd72083e1eb542560c49d5653ceb5ee2/celery_progress/backend.py#L63 https://github.com/czue/celery-progress/blob/4b1e023cbd72083e1eb542560c49d5653ceb5ee2/celery_progress/backend.py#L94

Is that possible? If it's, a fix would be to store the state once and use the stored value throughout the method instead of asking for it in each clause.

I'm also not sure what the unknown state is. It'll be logged and shown in the page if you use the unpublished version of celery-progress from github (by installing "git+https://github.com/czue/celery-progress@master#egg=celery-progress" or putting it in your requirements file).