OvalMoney / celery-exporter

Prometheus metrics exporter for Celery
MIT License
142 stars 37 forks source link

Upgrade Deps and add testing for Celery 5 #52

Open SharpEdgeMarshall opened 3 years ago

SharpEdgeMarshall commented 3 years ago

Upgrade all deps and start testing for Celery 5 @MRoci would you like to give a look?

xrmx commented 3 years ago

You should rebase on master and make the celery dependency >=4 here https://github.com/OvalMoney/celery-exporter/blob/master/Cargo.toml#L34

MRoci commented 3 years ago

i know very little about celery5, but my main concern is that in testing we are mocking many celery functions and the definition of Event from source is very generous (it could contain any field).

So I think we should investigate the differences between celery4 and celery5 in the handling of monitoring events, but hopefully this will maybe help us improve the queue detection.

These days I can try to help digging into the matter and report here what I find.

SharpEdgeMarshall commented 3 years ago

@MRoci I referred to this: https://docs.celeryproject.org/en/stable/whatsnew-5.0.html

LouisStAmour commented 3 years ago

I wouldn't say that Celery 4 to Celery 5 led to breaking changes in events, it feels like Celery is being relatively minimally maintained but that large changes are also being accepted, sometimes without updating corresponding documentation.

For example, last month, a task-cancelled event was added: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L417

The documentation history suggests little has changed: https://github.com/celery/celery/commits/8ebcce1523d79039f23da748f00bec465951de2a/docs/userguide/monitoring.rst

Searching for documented events:

task-sent last updated 5 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/app/amqp.py#L547-L548

task-received was last updated 6 years ago: https://github.com/celery/celery/blame/2411504f4164ac9acfa20007038d37591c6f57e5/celery/worker/strategy.py#L162-L170

task-started was last updated 9 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L468

task-succeeded was last updated 7 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L505 or https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L721-L723 [it's not clear to me if this means it gets sent twice?]

task-failed was last updated 8 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L573-L577

task-rejected was last updated 6 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L593

task-revoked was last updated 9 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L430

task-retried was last updated 9 years ago: https://github.com/celery/celery/blame/8d6778810c5153c9e4667eed618de2d0bf72663e/celery/worker/request.py#L512-L514

worker-online was last updated 9 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L51

worker-offline was last updated 14 months ago, to remove automatic retry: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L61

worker-heartbeat was last updated 6 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L53

Worker event attributes, 9 years ago: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/celery/worker/heartbeat.py#L42-L47

Outside of an update to use f-strings and a skipped test, I'm not seeing many changes since 5 years ago in an events state unit test. Not that you would have to update the unit test if you updated events of course: https://github.com/celery/celery/blame/8ebcce1523d79039f23da748f00bec465951de2a/t/unit/events/test_state.py

Could there be updates I've missed? Absolutely. But I haven't seen any important ones yet?

Since last year it's been well-known that if you run Flower against Celery 5, everything just works as long as you use the Docker image. https://github.com/mher/flower/issues/1029#issuecomment-703194703 They're working on a new release of Flower, but that appears to be because the command line tooling changed within Celery and they're looking to drop Python 2 support.

Intriguingly, Flower also seems to have basic support for Prometheus, though it could be improved: https://github.com/mher/flower/issues/1076

That said, since events include task args and kwargs and other structures, it's possible that these may have changed in the interim. As in, the description of a task might have changed in minor ways. But I doubt we would go into that much detail when collecting Prometheus metrics.

daadu commented 3 years ago

I went through 4->5 upgrade doc, can't see anything about "event". Looks safe for me to upgrade.