PolicyStat / jobtastic

Make your user-responsive long-running Celery jobs totally awesomer.
http://policystat.github.com/jobtastic/
MIT License
644 stars 61 forks source link

Caching task even though cache_duration = -1 #85

Closed philmaweb closed 3 years ago

philmaweb commented 4 years ago

I define a JobtasticTask with cache_duration = -1 but it looks like it is buffering the result.

I make use of jobtastic to have some easy progress indication of some longer running tasks. So I have 3 urls defined for simplicity let's call them form, progress, result.

path('form/<int:a_id>', views.form, name='form')
path('progress/<str:task_id>/<int:a_id>', views.progress, name='progress')
path('result/<int:a_id>', views.result, name='result')

Initially, if I submit from the form the view will be using delay_or_fail(a_id=0). The first time it will run fine and redirect to the result_url result/0 . The task url will look like so:

I create another transaction and redo the form, initiating the call delay_or_fail(new_id=1). It will reuse the same task_id (as the worker is no longer busy, so sure) and immediately forward to the new result_url result/1 - even though it never worked on the task - probably due to result caching.

The a_id is the id of a database object, so the user will be faced with a 404 because the required object was not created.

If one however performs a refresh in the form before submitting - which redirects to the task, it will sometimes work.

Help would be much appreciated. I don't want to kill the workers after a single use - which would be a way I could mitigate this.

Django=2.2
Python=3.6
Jobtastic=2.1.1
celery=3.1.25
django-celery=3.2.2
jlward commented 4 years ago

This sounds to be an issue related to herd_avoidance instead rather than cache_duration. Try setting herd_avoidance to 0.

philmaweb commented 4 years ago

I changed herd_avoidance_timeout from 60 to 0. I still have the same behavior.

To clarify a little more, the progress template is checking in regular intervals at the taskUrl from django-celery for completion of the task with the submitted task-id. It's using the 'js/jquery-celery/celery.js' and js/jquery-celery/celery.progressbar.js.

On success the polling function will redirect to the result_url which is result/1 using the customSuccess() function.

function customSuccess(){
  $(location).attr('href', "{{ result_url }}")
}
jlward commented 4 years ago

Can you please post the full code for your task?

philmaweb commented 4 years ago

I condensed the relevant parts:

tasks.py

class TheTask(JobtasticTask):
    significant_kwargs = []
    herd_avoidance_timeout = 0
    cache_duration = -1

    def calculate_result(self, a_id, **kwargs):
        # imports
        total_progress = 100
        self.update_progress(1, total_progress)
        obj = MyObject.objects.get(pk=a_id)

        # [...] logic and plenty of stuff here

        obj.save()
        self.update_progress(total_progress, total_progress)
        return a_id

views.py

@temp_or_login_required
def form(request, a_id, user):
    # [...] logic 
    result = TheTask.delay_or_fail(a_id=a_id)
    return redirect(reverse('progress', kwargs={'task_id': result.task_id, 'a_id': a_id}))
@temp_or_login_required
def progress(request, task_id, a_id, user):
    # check if user has access to check the status, or if we shared the task_id
    # user is handled by decorator
    obj = get_object_or_404(MyObject, pk=a_id, user=user)

    result_url = reverse('result', kwargs={'a_id' : a_id})
    print(result_url)

    # "celery-tasks"
    task_url = reverse('task_status', kwargs={'task_id': task_id})
    context = {'task_id': task_id, 'result_url': result_url, 'task_url': task_url}
    context = if_temp_update_context(user=user, context=context)
    return render(request, template_name='display_progress.html', context=context)
philmaweb commented 4 years ago

I think changing from significant_kwargs = [] to significant_kwargs = [('a_id', str)] actually just fixed this. But shouldn't just setting cache_duration = -1 be enough?

jlward commented 4 years ago

I think changing from significant_kwargs = [] to significant_kwargs = [('a_id', str)] actually just fixed this. But shouldn't just setting cache_duration = -1 be enough?

It may or may not depend on what cache backend you are using. As I recall, some of the cache backends in django do wonky things when you are trying to set cache to "do not cache". I've certainly run into issues in the past, although I've not specifically tried on newer versions of django.

My money says this is still an issue with herd_avoidance. In two places we call cache.set. Once for cache_duration and once for herd_avoidance. For cache_duration, we only call set, if the duration is 0 or higher. For herd_avoidance we call it no matter what. This leads me to believe that something wonky is going on with a herd_avoidance value of 0. If I were a gambling man, I'd say that setting herd_avoidance to -1 would also fix this issue.

philmaweb commented 4 years ago

Alright, thanks for the clarification! I guess this thread might be helpful as documentation for other people running into wonky caching.