cronitorio / cronitor-python

Python client for Cronitor
MIT License
73 stars 9 forks source link

cronitor-python seems to be ignoring task schedules #22

Closed DelboyJay closed 2 years ago

DelboyJay commented 2 years ago

I might have misunderstood something here but our celery tasks are being created in cronitor.io but it is not automatically setting the schedule which means that if my task never runs for some reason such as a config error for example then we never get alerted. Should this be happening automatically or do I need to write extra code to pull out the schedule from Celery for each task and set it via your package so that the cronitor.io job is in proper sync with the CELERY_BEAT_SCHEDULE definition in our code?

We are using Celery V4.4 at the moment. Here is an example of the code I'm using:

The CELERY beat schedule is defined thus:

CELERY_BEAT_SCHEDULE = {
    'task_name': {
        'task': 'my_task_func_name',
        'schedule': crontab(minute=0, hour=1),
    },
}

The test code I have that sets up Celery and cronitor is...

def make_async_app():
    app = Celery('async')
    app.config_from_object('settings.celery_async', namespace='CELERY')
    app.autodiscover_tasks(related_name='tasks_async')
    cronitor.environment = settings.ENVIRONMENT
    cronitor.celery.initialize(app, api_key=settings.CRONITOR_API_KEY)
    return app

To reiterate everything is working fine in the sense that the task runs and your package is sending start and end times and task success or failure but when I look at the job in cronitor.io the schedule field under Job Details is not set to "every 1 hour" or similar as specified by the line 'schedule': crontab(minute=0, hour=1), under CELERY_BEAT_SCHEDULE above.

jdotjdot commented 2 years ago

A task set up with crontab(minute=0, hour=1) should indeed get created in Cronitor from the Celery integration (I just tested again myself with it and it worked properly), so we'll have to debug a bit further here. Two initial questions that come to mind:

DelboyJay commented 2 years ago

Hi! Thanks for the reply.

So after some further investigation, there appears to be an issue where cronitor-python cannot find the beats database file and I get the feeling that fixing this might fix my issue above.

To answer your question we are only using the Celery package and no other packages that are django-celery specific. As for the hour schedule thing your right thanks for that! I was testing locally with one-minute tasks though */1 * * * * with the same issue. Currently no monitors are being created the code is skipping them because celerybeat_only is set to True so we can ignore non-scheduled async tasks and for some reason the x-cronitor-celerybeat-name header isn't being set so the code just returns and does nothing in the ping_monitor_* functions.

So the problem I'm seeing is as follows. Our celery beats filename is defined as:

CELERY_BEAT_SCHEDULE_FILENAME = BASE_DIR('celerybeat-schedule')  # '/home/user/work/master/celerybeat-schedule'

But Celery actually creates a file with a .db extension on the disk. /home/user/work/master/celerybeat-schedule.db and the cronitor-python code here is not expecting that so it fails with the exception [Errno 2] No such file or directory: '/home/user/work/master/celerybeat-schedule'

As a test, I tried modifying your code to this to account for the .db extension to see if it fixes things

            new_schedule = tempfile.NamedTemporaryFile(suffix='.db')
            with open(sender.schedule_filename + '.db', 'rb') as current_schedule:
                shutil.copyfileobj(current_schedule, new_schedule)

But this fails with the following error

  File "/usr/lib/python3.7/dbm/__init__.py", line 88, in open
    raise error[0]("db type could not be determined")
dbm.error: db type could not be determined

I'm not sure what to do about that because your code is clearly copying the file and a diff of the copied file with the source matches so it should work. I also tried deleting the database file so that Celery creates a new one but I get the same error. 🤔

Just to reiterate we are using celery 4.4.0 at the moment so I'm not sure if later versions do something different with that .db extension.

jdotjdot commented 2 years ago

Thanks so much for this—I do think it explains it. There is a note in the README (apparently too well-hidden) that mentions that specifying a custom location for the task database isn't supported. Celery's task scheduler class and the file-based database it uses, shelve, seem very sensitive, so it's been difficult to get it working at arbitrary locations.

image

Is there a reason you have to specify a non-default location? Could you give it a shot keeping it in the default place? Otherwise, we can look at adding this support, but I wouldn't have an estimate yet on how long it will take.

DelboyJay commented 2 years ago

My apologies, I clearly missed those notes! 😅

I have deleted the CELERY_BEAT_SCHEDULE_FILENAME from our settings so that the default is used but I still get No such file or directory: 'celerybeat-schedule'.

When debugging your code here, sender.schedule_filename is now set to just celerybeat-schedule. The file is appearing in the same folder as before but it still has the .db extension added which is why the code cannot find it.

According to the celery documentation here and searching for beat_schedule_filename, it appears the version of python determines if a .db extension is added. We are currently using python 3.7.13 so I guess that adds it.

beat_schedule_filename
Default: "celerybeat-schedule".

Name of the file used by PersistentScheduler to store the last run times of periodic tasks. Can be a relative or absolute path, but be aware that the suffix .db may be appended to the file name (depending on Python version).

Can also be set via the [celery beat --schedule](https://docs.celeryq.dev/en/4.4.0/reference/celery.bin.beat.html#cmdoption-celery-beat-s) argument.

I tried adding the .db extension into your code again as a test but it still says The database cannot be determined. so this still isn't a solution sadly.

jdotjdot commented 2 years ago

This is fairly odd. For shelve, I wouldn't expect the extension at the end to matter, whether it has .db or not—I can only imagine it being that it couldn't find the file at all. This module was also tested on Python 3.5 3.7, and 3.9, so the change in Python adding ".db" shouldn't by itself be causing this issue. It is also strange that you're seeing ".db" added always, since in my local testing, I'm always seeing celerybeat-schedule be created without any extension at all.

I'll see if I can replicate this and pin to the exact versions you're using.

jdotjdot commented 2 years ago

Can you paste the logs from when Celery starts up? I pinned to your exact version of Celery and it loaded fine. So I'd like to see the logs, and if I don't see anything there, I may want to do a screenshare if you'd be open to it.

This is what I saw:

[2022-08-30 13:31:34,404: WARNING/Beat] celery beat v4.4.0 (cliffs) is starting.
[2022-08-30 13:31:34,406: WARNING/Beat] __    -    ... __   -        _
LocalTime -> 2022-08-30 13:31:34
Configuration ->
    . broker -> amqp://guest:**@localhost:5672/cronitor
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> /var/folders/8l/tj4z1k4s225gtk6j4sng9zjc0000gn/T/tmpi5pfc1w7
    . logfile -> [stderr]@%WARNING
    . maxinterval -> 5.00 minutes (300s)
jdotjdot commented 2 years ago

Can you also paste the full traceback for this part?

But this fails with the following error

File "/usr/lib/python3.7/dbm/init.py", line 88, in open raise error[0]("db type could not be determined") dbm.error: db type could not be determined

DelboyJay commented 2 years ago

This is indeed strange as I would have expected others to have seen this as well so it must be some form of config or env setting on our side causing the issue. I'm just struggling to find out what it is.

So I have reverted all the code changes back but removed the following setting line from our settings file as you previously requested.

CELERY_BEAT_SCHEDULE_FILENAME = BASE_DIR('celerybeat-schedule')

I'm not longer trying to patch the filename in your code so to be clear I do not get the "db type could not be determined" in this normal case but instead, I get FileNotFoundError: [Errno 2] No such file or directory: 'celerybeat-schedule'

The Celery output is as follows:

celery beat v4.4.0 (cliffs) is starting.
__    -    ... __   -        _
LocalTime -> 2022-09-01 15:53:31
Configuration ->
    . broker -> amqp://guest:**@localhost:5672//
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> celerybeat-schedule
    . logfile -> [stderr]@%DEBUG
    . maxinterval -> 5.00 minutes (300s)

The error I see further down is:

2022-09-01T15:53:31.228545Z [error    ] Signal handler <function celerybeat_startup at 0x7f1aca90bf80> raised: FileNotFoundError(2, 'No such file or directory') name='celery.utils.dispatch.signal'
Traceback (most recent call last):
  File "/home/del/.virtualenvs/master/lib/python3.7/site-packages/celery/utils/dispatch/signal.py", line 288, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/del/.virtualenvs/master/lib/python3.7/site-packages/cronitor/celery.py", line 122, in celerybeat_startup
    with open(sender.schedule_filename, 'rb') as current_schedule:
FileNotFoundError: [Errno 2] No such file or directory: 'celerybeat-schedule'

And again I am seeing the file 'celerybeat-schedule.db' being created in my project folder. I've deleted this and re-run the code and the file gets recreated.

jdotjdot commented 2 years ago

Thanks so much. Would it be possible to share the entire log output? There's some more detail I want to look at that might give me some hints.

DelboyJay commented 2 years ago

Hi, Here is the entire log. Thanks for your time on this, I really appreciate it.

/home/del/.virtualenvs/master/bin/python /opt/pycharm-2022.1.3/plugins/python/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 40451 --file py run beat --loglevel=debug
Connected to pydev debugger (build 221.5921.27)

2022-09-02T12:28:56.488025Z [info     ] Override settings for pynamo not available  name='pynamodb.settings'
2022-09-02T12:28:56.488373Z [info     ] Using Default settings value   name='pynamodb.settings'
2022-09-02T12:28:56.836733Z [info     ] Found credentials in environment variables. name='botocore.credentials'
celery beat v4.4.0 (cliffs) is starting.
__    -    ... __   -        _
LocalTime -> 2022-09-02 12:29:02
Configuration ->
    . broker -> amqp://guest:**@localhost:5672//
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> celerybeat-schedule
    . logfile -> [stderr]@%DEBUG
    . maxinterval -> 5.00 minutes (300s)
2022-09-02T12:29:02.277339Z [debug    ] Setting default socket timeout to 30 name='celery.beat'
2022-09-02T12:29:02.277530Z [debug    ] Setting default socket timeout to 30 name='celery.beat'
2022-09-02T12:29:02.278055Z [info     ] beat: Starting...              name='celery.beat'
2022-09-02T12:29:02.278182Z [info     ] beat: Starting...              name='celery.beat'
2022-09-02T12:29:02.302215Z [debug    ] Current schedule:
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */15 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */15 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */15 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 6 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 6,10 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */12 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 3 * 1 * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 4 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */10 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */5 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 3 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 10 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 2 mon-fri * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 8 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 30 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 6,10 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 19 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 4 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)> name='celery.beat'
2022-09-02T12:29:02.302386Z [debug    ] Current schedule:
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */15 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */15 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */15 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 6 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 6,10 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */12 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 3 * 1 * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 4 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */10 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */5 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 */1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 3 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 10 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 2 mon-fri * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 8 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 30 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 6,10 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 19 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: 0 4 * * * (m/h/d/dM/MY)>
<ScheduleEntry: <redacted.async_task> <crontab: */1 * * * * (m/h/d/dM/MY)> name='celery.beat'
2022-09-02T12:29:02.302858Z [debug    ] beat: Ticking with max interval->5.00 minutes name='celery.beat'
2022-09-02T12:29:02.303014Z [debug    ] beat: Ticking with max interval->5.00 minutes name='celery.beat'
2022-09-02T12:29:02.309520Z [error    ] Signal handler <function celerybeat_startup at 0x7f1a1dc47f80> raised: FileNotFoundError(2, 'No such file or directory') name='celery.utils.dispatch.signal'
Traceback (most recent call last):
  File "/home/del/.virtualenvs/master/lib/python3.7/site-packages/celery/utils/dispatch/signal.py", line 288, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/del/.virtualenvs/master/lib/python3.7/site-packages/cronitor/celery.py", line 122, in celerybeat_startup
    with open(sender.schedule_filename, 'rb') as current_schedule:
FileNotFoundError: [Errno 2] No such file or directory: 'celerybeat-schedule'
2022-09-02T12:29:02.310517Z [error    ] Signal handler <function celerybeat_startup at 0x7f1a1dc47f80> raised: FileNotFoundError(2, 'No such file or directory') name='celery.utils.dispatch.signal'
Traceback (most recent call last):
  File "/home/del/.virtualenvs/master/lib/python3.7/site-packages/celery/utils/dispatch/signal.py", line 288, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/del/.virtualenvs/master/lib/python3.7/site-packages/cronitor/celery.py", line 122, in celerybeat_startup
    with open(sender.schedule_filename, 'rb') as current_schedule:
FileNotFoundError: [Errno 2] No such file or directory: 'celerybeat-schedule'
2022-09-02T12:29:02.337535Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.337777Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.370207Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.370521Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.370952Z [debug    ] beat: Synchronizing schedule... name='celery.beat'
2022-09-02T12:29:02.371091Z [debug    ] beat: Synchronizing schedule... name='celery.beat'
2022-09-02T12:29:02.371587Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.371723Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.378898Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.379081Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.388272Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.388582Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.388972Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.389110Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.390547Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.390701Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.395994Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.396331Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.396680Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.396812Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.400243Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.400582Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.408032Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.408395Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.408757Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.408894Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.410417Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.410603Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.416625Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.416983Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.417330Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.417461Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.419067Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.419220Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.424566Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.424893Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.425249Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.425394Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.426872Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.427040Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.433208Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.433514Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.434024Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.434164Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.435540Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.435703Z [info     ] Scheduler: Sending due task <redacted.async_task> name='celery.beat'
2022-09-02T12:29:02.441381Z [debug    ] 'celery.dispatched': +1.000000 instance='<redacted.async_task>' name='metrics' tags=[]
2022-09-02T12:29:02.441690Z [debug    ] 'celery.all.dispatched': +1.000000 name='metrics' tags=[]
2022-09-02T12:29:02.442142Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.442276Z [debug    ] <redacted.async_task> sent. id->00000000-0000-0000-0000-000000000000 name='celery.beat'
2022-09-02T12:29:02.443268Z [debug    ] beat: Waking up in 57.55 seconds. name='celery.beat'
2022-09-02T12:29:02.443540Z [debug    ] beat: Waking up in 57.55 seconds. name='celery.beat'
jdotjdot commented 2 years ago

This is interesting -- you have very different logs than what I see. What should be happening, although it's weird, is celery starts up, shuts down, and starts up a second time with the new schedule file. You seem to have only a single startup, and with the run schedule file.

Can you show me the command you're using to run Celery, with all the CLI args?

 -------------- celery@JJs-MBP-3 v4.4.0 (cliffs)
--- ***** ----- 
-- ******* ---- macOS-12.5-x86_64-i386-64bit 2022-09-05 19:35:32
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         tasks:0x104ef6070
- ** ---------- .> transport:   amqp://guest:**@localhost:5672/cronitor
- ** ---------- .> results:     disabled://
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery

[tasks]
  . tasks.add
  . tasks.failit
  . tasks.subtract
  . tasks.test

[2022-09-05 19:35:32,171: INFO/Beat] beat: Starting...
[2022-09-05 19:35:32,221: INFO/Beat] beat: Shutting down...
[2022-09-05 19:35:32,223: WARNING/Beat] celery beat v4.4.0 (cliffs) is starting.
[2022-09-05 19:35:32,227: WARNING/Beat] __    -    ... __   -        _
LocalTime -> 2022-09-05 19:35:32
Configuration ->
    . broker -> amqp://guest:**@localhost:5672/cronitor
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> /var/folders/8l/tj4z1k4s225gtk6j4sng9zjc0000gn/T/tmp2vmvnycl
    . logfile -> [stderr]@%WARNING
    . maxinterval -> 5.00 minutes (300s)
[2022-09-05 19:35:32,230: INFO/Beat] beat: Starting...
[2022-09-05 19:35:32,251: INFO/Beat] Scheduler: Sending due task do the add (tasks.add)
[2022-09-05 19:35:32,262: INFO/Beat] Scheduler: Sending due task print-every-10-seconds (tasks.test)
DelboyJay commented 2 years ago

I believe the second startup fails to happen because it cannot find the celery database file. Do you know why we have to use the database file at all? Is there no way to work all this out from the celery tasks themselves that are already defined in memory?

jdotjdot commented 2 years ago

It is, but what is strange is that it can't find the initial file, not the created one. That is surprising to me, which is why I am wondering about the CLI args and full config used to start up.

The database file is not used to gather the tasks--those actually come entirely from the Python code. The reason the database file has to be there is that when starting up, celerybeat does some odd things with persistence to keep track of what tasks have already been run, how recently, etc -- metrics around tasks. While this makes sense, the problem is that shelve isn't super robust, so when we ignore the database entirely (e.g. just let it create a new one, or try using the old one) is that it would invariably end up weirdly corrupted. With PersistentScheduler, the only way we found to get it working properly was to copy over the database file to a new location and restart.

DelboyJay commented 2 years ago

Can you show me the command you're using to run Celery, with all the CLI args?

I'm compiling this for you now I just need to remove some sensitive information, get it approved and I'll post it here ASAP.

jdotjdot commented 2 years ago

No problem!

On Mon, Sep 12, 2022 at 11:59 Del Hyman-Jones @.***> wrote:

Can you show me the command you're using to run Celery, with all the CLI args?

I'm compiling this for you now I just need to remove some sensitive information, get it approved and I'll post it here ASAP.

— Reply to this email directly, view it on GitHub https://github.com/cronitorio/cronitor-python/issues/22#issuecomment-1243951888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKEOQKICGXIK2Y25O6VFXTV55HPNANCNFSM56ED2ALQ . You are receiving this because you commented.Message ID: @.***>

DelboyJay commented 2 years ago

We use a custom command line but effectively this runs the Celery beat function with default arguments. Basically, the following code is executed:

options = {
  "loglevel": "DEBUG",
  "schedule": None,
  "no_color": "False",
}

app = Celery('async')
app.config_from_object('settings.celery_async', namepspace='CELERY')
app.autodiscover_tasks(related_name='tasks_async')
app.Beat(**options).run()

The settings config loaded by config_from_object is as follows:

CELERY_BROKER_URL = <URL to our rabbitMQ server>

CELERY_RESULT_BACKEND = <information regarding our Redis cache server>

CELERY_TASK_QUEUES = <
  contains a Queue object for each of our services.
  We use Kombu [https://github.com/celery/kombu] for the Exchange object here.
  >

CELERY_TASK_IGNORE_RESULT = True

CELERY_BROKER_TRANSPORT_OPTIONS = {
    'confirm_publish': True
}

CELERY_TASK_ROUTES = <Set to our task router class>

CELERY_BEAT_SCHEDULE = <List of all the scheduled tasks>

CELERY_TASK_QUEUE_HA_POLICY = 'all'

# The following was commented out becasue your documentation says it is 
# not supported yet and we don't really need it.
# CELERY_BEAT_SCHEDULE_FILENAME = BASE_DIR('celerybeat-schedule') 

CELERY_ACCEPT_CONTENT = [
    'json',
    'pickle',
]

# Disable broker connection pooling due to bug
# - https://github.com/celery/celery/issues/3773
# - https://github.com/celery/celery/issues/3377

CELERY_BROKER_POOL_LIMIT = None

CELERY_TASK_ALWAYS_EAGER = False

CELERY_TASK_EAGER_PROPAGATES = True

CELERY_TASK_SERIALIZER = 'pickle'

CELERY_WORKER_DISABLE_RATE_LIMITS = True

CELERY_WORKER_SEND_TASK_EVENTS = False

CELERY_WORKER_HIJACK_ROOT_LOGGER = False

CELERY_WORKER_REDIRECT_STDOUTS = False
jdotjdot commented 2 years ago

Thanks -- this is very interesting, manually calling app.Beat(**options).run(). Where is Cronitor integration initialized?

DelboyJay commented 2 years ago

Sorry, I missed that! It should look like this...

app = Celery('async')
app.config_from_object('settings.celery_async', namespace='CELERY')
app.autodiscover_tasks(related_name='tasks_async')
cronitor.environment = settings.ENVIRONMENT
cronitor.celery.initialize(
    app,
    celerybeat_only=settings.CRONITOR_CELERYBEAT_ONLY,
    api_key=settings.CRONITOR_API_KEY,
)
app.Beat(**options).run()
jdotjdot commented 2 years ago

Thanks for sending this. I'm working to replicate. My current theory is it's app.Beat.run() being called manually vs the CLI that's the problem, but haven't confirmed yet.

jdotjdot commented 2 years ago

The manual run isn't the issue. I was able to run it with the Celery integration using app.Beat().run(), and it went fine.

celery beat v4.4.0 (cliffs) is starting.
__    -    ... __   -        _
LocalTime -> 2022-10-03 16:52:52
Configuration ->
    . broker -> amqp://guest:**@localhost:5672/cronitor
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> celerybeat-schedule
    . logfile -> [stderr]@%DEBUG
    . maxinterval -> 5.00 minutes (300s)
[2022-10-03 16:52:52,656: DEBUG/MainProcess] Setting default socket timeout to 30
[2022-10-03 16:52:52,656: INFO/MainProcess] beat: Starting...
[2022-10-03 16:52:52,667: DEBUG/MainProcess] Current schedule:
<ScheduleEntry: do the add tasks.add(4, 5) <freq: 5.00 seconds>
<ScheduleEntry: print-every-10-seconds tasks.test('hey') <freq: 10.00 seconds>
<ScheduleEntry: this will fail tasks.failit() <freq: 1.00 minute>
<ScheduleEntry: print-on-the-10 tasks.test('no') <crontab: */10 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: print-on-the-20 tasks.test('printing') <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: test2 tasks.test('printing') <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)>
[2022-10-03 16:52:52,667: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes
[2022-10-03 16:52:52,670: INFO/MainProcess] beat: Shutting down...
[2022-10-03 16:52:52,670: WARNING/MainProcess] celery beat v4.4.0 (cliffs) is starting.
[2022-10-03 16:52:52,671: WARNING/MainProcess] __    -    ... __   -        _
LocalTime -> 2022-10-03 16:52:52
Configuration ->
    . broker -> amqp://guest:**@localhost:5672/cronitor
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> /var/folders/8l/tj4z1k4s225gtk6j4sng9zjc0000gn/T/tmpy8qrztcm
    . logfile -> [stderr]@%WARNING
    . maxinterval -> 5.00 minutes (300s)
[2022-10-03 16:52:52,671: DEBUG/MainProcess] Setting default socket timeout to 30
[2022-10-03 16:52:52,671: INFO/MainProcess] beat: Starting...
[2022-10-03 16:52:52,674: DEBUG/MainProcess] Current schedule:
<ScheduleEntry: do the add tasks.add(4, 5) <freq: 5.00 seconds>
<ScheduleEntry: print-every-10-seconds tasks.test('hey') <freq: 10.00 seconds>
<ScheduleEntry: this will fail tasks.failit() <freq: 1.00 minute>
<ScheduleEntry: print-on-the-10 tasks.test('no') <crontab: */10 * * * * (m/h/d/dM/MY)>
<ScheduleEntry: print-on-the-20 tasks.test('printing') <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: test2 tasks.test('printing') <crontab: 0 1 * * * (m/h/d/dM/MY)>
<ScheduleEntry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)>
[2022-10-03 16:52:52,674: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes
[2022-10-03 16:52:52,684: DEBUG/MainProcess] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@local-computer', 'copyright': 'Copyright (c) 2007-2021 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 2.0. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 24.0.5', 'product': 'RabbitMQ', 'version': '3.9.4'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
[2022-10-03 16:52:52,685: INFO/MainProcess] Scheduler: Sending due task do the add (tasks.add)
[2022-10-03 16:52:52,692: DEBUG/MainProcess] using channel_id: 1
[2022-10-03 16:52:52,692: DEBUG/MainProcess] Channel open
[2022-10-03 16:52:52,694: DEBUG/MainProcess] beat: Synchronizing schedule...
[2022-10-03 16:52:52,694: DEBUG/MainProcess] tasks.add sent. id->2975540e-9864-46b9-bbff-282c0abb4405
[2022-10-03 16:52:52,695: INFO/MainProcess] Scheduler: Sending due task print-every-10-seconds (tasks.test)
[2022-10-03 16:52:52,697: DEBUG/MainProcess] tasks.test sent. id->197e63bd-fff1-44d2-94db-234e6092de00
[2022-10-03 16:52:52,697: INFO/MainProcess] Scheduler: Sending due task this will fail (tasks.failit)
[2022-10-03 16:52:52,698: DEBUG/MainProcess] tasks.failit sent. id->bf1f163c-c9d3-45d4-b745-cec65baa2e1d
[2022-10-03 16:52:52,699: INFO/MainProcess] Scheduler: Sending due task print-on-

Your celerybeat does appear to be attempting to restart -- it runs the Cronitor code and then fails when trying to access the original celerybeat-schedule file. Why that file doesn't exist is definitely an interesting question, as celery should be creating it based on your logs. Maybe somehow that file isn't writable in your environment. Additionally, from the logs it seems like celerybeat is running twice--most things seem to be duplicated. Is it possible that that is happening and some sort of race condition is causing the issue, if both processes are trying to reference the same file?

DelboyJay commented 2 years ago

The problem I'm seeing is that it cannot find the file at all because when the file is physically created on disk, it has a .db extension but the Cronitor module is looking for the file without any extension. So for that reason, I don't think file permissions are the issue here.

Traceback (most recent call last):
  File "/home/del/.virtualenvs/platform_master/lib/python3.7/site-packages/celery/utils/dispatch/signal.py", line 288, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/del/.virtualenvs/platform_master/lib/python3.7/site-packages/cronitor/celery.py", line 122, in celerybeat_startup
    with open(sender.schedule_filename, 'rb') as current_schedule:
FileNotFoundError: [Errno 2] No such file or directory: '/home/del/work/platform_master/celerybeat-schedule'

As for things running twice, are you thinking this is because you are seeing the open file error twice? I have put a breakpoint on this line to see what's going on but something strange is happening here as the breakpoint is only hit once yet the console output clearly shows the same line has been executed twice. Maybe there is an issue with threading and PyCharm here where the second call is not trapped? 🤷‍♂️

Regarding your other point about me running app.Beat(**options).run() I must have left this in by mistake as the documentation does not suggest that this should be called and from what I can see it is called from cronitor's celerybeat_startup function. I have removed app.Beat(**options).run() but now my program terminates straight away and celerybeat_startup is never called so it suggests that I need to start some sort of message loop for Django so that it doesn't quit which is what the Beat function was doing for me.

DelboyJay commented 2 years ago

J.J. Leave this with me as I'm trying a few things out and I'll get back to you with my findings if that's OK?

jdotjdot commented 2 years ago

Are you always running this within PyCharm? The PyCharm debugger can make things weird, have you tried it just in a regular console?

Indeed you need some sort of loop, if you don’t call app.Beat().run(), the way you usually do it is with the Celery CLI.

On Fri, Oct 7, 2022 at 06:22 Del Hyman-Jones @.***> wrote:

The problem I'm seeing is that it cannot find the file at all because when the file is physically created on disk, it has a .db extension but the Cronitor module is looking for the file without any extension. So for that reason, I don't think file permissions are the issue here.

Traceback (most recent call last):

File "/home/del/.virtualenvs/platform_master/lib/python3.7/site-packages/celery/utils/dispatch/signal.py", line 288, in send

response = receiver(signal=self, sender=sender, **named)

File "/home/del/.virtualenvs/platform_master/lib/python3.7/site-packages/cronitor/celery.py", line 122, in celerybeat_startup

with open(sender.schedule_filename, 'rb') as current_schedule:

FileNotFoundError: [Errno 2] No such file or directory: '/home/del/work/platform_master/celerybeat-schedule'

As for things running twice, are you thinking this is because you are seeing the open file error twice? I have put a breakpoint on this line to see what's going on but something strange is happening here as the breakpoint is only hit once yet the console output clearly shows the same line has been executed twice. Maybe there is an issue with threading and PyCharm here where the second call is not trapped? 🤷‍♂️

Regarding your other point about me running app.Beat(options).run() I must have left this in by mistake as the documentation https://pypi.org/project/cronitor/ does not suggest that this should be called and from what I can see it is called from cronitor's celerybeat_startup function. I have removed app.Beat(options).run() but now my program terminates straight away and celerybeat_startup is never called so it suggests that I need to start some sort of message loop for Django so that it doesn't quit which is what the Beat function was doing for me.

— Reply to this email directly, view it on GitHub https://github.com/cronitorio/cronitor-python/issues/22#issuecomment-1271409642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKEOQKFV6AGHG74DF2B3VDWB72YBANCNFSM56ED2ALQ . You are receiving this because you commented.Message ID: @.***>

DelboyJay commented 2 years ago

Hi J.J.

So after some deeper investigation into everything, it looks like I'm not able to use this module with our system and have it automatically create the jobs/monitors and ping them. There are about 7 issues I have found that stop me and the main one is the issue around the celerybeat-schedule file, and all the temp files you create require a .db extension in order for it to work without throwing exceptions but then this leads us to other issues down the line.

To fix the problem, I now manually create the monitors when we run the celery "beat" command line and I've manually hooked into celery callbacks in order to send the pings to those jobs under the async worker threads and everything is now working. So I think I can close this issue now.

Thank you again for all your help on this, I really appreciate it.

jdotjdot commented 2 years ago

Of course, and I'm sorry for all the trouble!

In case you didn't see it, there's a built-in decorator you can use for manual instrumentation.