PolicyStat / jobtastic

Make your user-responsive long-running Celery jobs totally awesomer.
http://policystat.github.com/jobtastic/
MIT License
645 stars 61 forks source link

Add support for Celery >= 4.0 #69

Open gabn88 opened 8 years ago

gabn88 commented 8 years ago
winhamwr commented 7 years ago

Celery 4.0 has lots of good features that I'm excited about. We'd love to accept a pull request for Celery 4.0 support. If nobody is able to to contribute that PR soon, after the first of the year, PolicyStat will probably be able to dedicate some time to putting the PR together to make it happen.

In the meantime, we're happy to accept PR's that make partial progress towards this goal, and we can use a celery_4 branch to aggregate them.

Here's what I think we can do with 4.0 support:

Make sure we support 3.1.25

The first step to upgrading is supporting 3.1.25. Let's bump the requirement for the test suite and then fix all of the deprecation warnings.

Drop support for Celery < 4.0 and bump to Jobtastic 2.0

There are lots of backward compatibility hacks in the Jobtastic codebase that impede maintenance. Celery 4.0 adds several things that will greatly simplify the code base.

Handling Task Connection Errors

The delay_or_FOO methods currently do a lot of work to catch task connection errors, and Celery 4.0 will make this much easier.

Move tests to pytext

Celery 4.0 comes with lots of testing improvements. Lets move to pytest in order to take advantage of those.

User decorator syntax

Subclassing is no longer the hotness, so we should use the decorators.

Deprecations

There are several deprecations that affect us.

gabn88 commented 7 years ago

I'm already using Jobtasting on 3.1.25? It seems to work fine :)

jlward commented 6 years ago

I have a PR up that handles quite a bit of the changes needed to support celery 4. The problem is, any test with a busted broker in celery 4 hangs. I don't know why. My suspicion is that celery 4 has done a lot of work to prevent broker exceptions from happening. I don't know for certain though. Once I figure that out, I don't think it'll be too difficult to getting tests passing with celery 3/4 support.

jlward commented 6 years ago

https://github.com/celery/celery/issues/4296

Looks like we'll need to peg the version of kombu for celery 4.x

gabn88 commented 6 years ago

I would love to see this merged, the dependency of celery 3 is going to bite me in the short run.

If you need any help, please ring a bell. Thanks!

jlward commented 6 years ago

Hello @gabn88 , I am actually in the process of getting my PR for basic celery 4 support code reviewed as we speak. I'm hoping to get a new version up on PyPi today if I can.

gabn88 commented 6 years ago

That's great. I'll keep an eye on it. Thanks!

jlward commented 6 years ago

Jobtastic 2.0.0 is on PyPi. It has celery 4.x support

gabn88 commented 6 years ago

We have upgraded, but now it is not working anymore. Possible because the upgrade also installed the lastest kombu instead of 4.0.2. Will report back after downgrading that.

gabn88 commented 6 years ago

On my development server the new jobtastic (2.0.0) with celery 4.2.1 and kombu 4.2.1 is working fine, however on my production server it is failing. 'Normal' celery tasks are working fine though. Cannot seem to find the issue, but the server runs out with Gateway timeout when the jobtastic job is planned...

The task:

class FillScheduleTask(JobtasticTask):
    """
    Creates a real schedule from a template
    """
    significant_kwargs = []
    # How long should we give a task before assuming it has failed?
    herd_avoidance_timeout = 0  # all tasks of this type within this time are removed... not so nice!
    cache_duration = -1  # No caching

    def calculate_result(self, template_id, horse_ids, amount_of_weeks_to_fill, start_date, delete, 
                         overwrite_existing, set_default_rider, **kwargs):
        """
        """      
        results = []
        scheme = get_object_or_404(TemplateScheme, pk=template_id)
        users = User.objects.filter(pk__in=user_ids)
        additions_to_do = users.count()*amount_of_weeks_to_fill

        for user in users:
            results.append(scheme.fill_schedule(user, amount_of_weeks_to_fill, start_date, delete,
                                 overwrite_existing, additions_to_do, self))

        return results

Called with:

result = FillScheduleTask.delay_or_eager(
                template_id=template_id, user_ids=list(users.values_list('id', flat=True)), amount_of_weeks_to_fill=amount_of_weeks_to_fill, start_date=start_date, 
                delete=delete, overwrite_existing=overwrite_existing)

The weird thing is that it is working fine on dev, but not on production.

The broker is on an external server in production, but like I said it is working fine for regular celery tasks with the same settings.

EDIT2: I still worked on finding the issue for a full day now, but still haven't found the root cause. I do know now that when I inherit from celery.Task and update def calculate_result to run and remove the task.update_progress calls from the function and then use FillScheduleTask.apply_async() everything is working fine. So it definitely has to do with the 'magic' that Jobtastic adds, whether it is the lock, or the cache, or something else. The weird thing is that it used to work fine before jobtastic 2.0 and on celery 3.x. And that it is still working fine on the development server.

gabn88 commented 5 years ago

I tried again to use Jobtastic with celery 4.3 in production. In development everything works fine, but in production it fails.

I have searched for a day for the issue now, and it seems that the apply_async method on the JobtasticTask is broken. It add an extra caching mechanism, so when the task is executed once, it cannot be executed anymore (even not with different kwargs).

Edit: The problem seems to be that in development I have only one (local) cache. In production I have a cache on the webserver and a seperate cache on the worker. I use the cache on the worker as CELERY_RESULT_BACKEND. But only when I clear the (local) cache on the webserver I can do tasks again.

Edit2: Fixed it by explicitly defining the CELERY_RESULT_BACKEND as a cache on Django, a.k.:

CACHES = {
    'default': env.cache('REDIS_URL'),
    'worker': env.cache('CELERY_RESULT_BACKEND')
}

and adding also in the settings:

JOBTASTIC_CACHE = 'worker'

After restarting the worker and the apache server (I had to stop it and start it again, a restart was not working) it is working! :)