Open SharpEdgeMarshall opened 3 years ago
You should rebase on master and make the celery dependency >=4 here https://github.com/OvalMoney/celery-exporter/blob/master/Cargo.toml#L34
i know very little about celery5, but my main concern is that in testing we are mocking many celery functions and the definition of Event
from source is very generous (it could contain any field).
So I think we should investigate the differences between celery4 and celery5 in the handling of monitoring events, but hopefully this will maybe help us improve the queue detection.
These days I can try to help digging into the matter and report here what I find.
@MRoci I referred to this: https://docs.celeryproject.org/en/stable/whatsnew-5.0.html
I wouldn't say that Celery 4 to Celery 5 led to breaking changes in events, it feels like Celery is being relatively minimally maintained but that large changes are also being accepted, sometimes without updating corresponding documentation.
For example, last month, a task-cancelled
event was added: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L417
The documentation history suggests little has changed: https://github.com/celery/celery/commits/8ebcce1523d79039f23da748f00bec465951de2a/docs/userguide/monitoring.rst
Searching for documented events:
task-sent
last updated 5 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/app/amqp.py#L547-L548
task-received
was last updated 6 years ago: https://github.com/celery/celery/blame/2411504f4164ac9acfa20007038d37591c6f57e5/celery/worker/strategy.py#L162-L170
task-started
was last updated 9 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L468
task-succeeded
was last updated 7 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L505 or https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L721-L723 [it's not clear to me if this means it gets sent twice?]
task-failed
was last updated 8 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L573-L577
task-rejected
was last updated 6 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L593
task-revoked
was last updated 9 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L430
task-retried
was last updated 9 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L512-L514
worker-online
was last updated 9 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L51
worker-offline
was last updated 14 months ago, to remove automatic retry: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L61
worker-heartbeat
was last updated 6 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L53
Worker event attributes, 9 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L42-L47
Outside of an update to use f-strings and a skipped test, I'm not seeing many changes since 5 years ago in an events state unit test. Not that you would have to update the unit test if you updated events of course: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/t/unit/events/test_state.py
Could there be updates I've missed? Absolutely. But I haven't seen any important ones yet?
Since last year it's been well-known that if you run Flower against Celery 5, everything just works as long as you use the Docker image. https://github.com/mher/flower/issues/1029#issuecomment-703194703 They're working on a new release of Flower, but that appears to be because the command line tooling changed within Celery and they're looking to drop Python 2 support.
Intriguingly, Flower also seems to have basic support for Prometheus, though it could be improved: https://github.com/mher/flower/issues/1076
That said, since events include task args and kwargs and other structures, it's possible that these may have changed in the interim. As in, the description of a task might have changed in minor ways. But I doubt we would go into that much detail when collecting Prometheus metrics.
I went through 4->5 upgrade doc, can't see anything about "event". Looks safe for me to upgrade.
Upgrade all deps and start testing for Celery 5 @MRoci would you like to give a look?