getsentry / arroyo

A library to build streaming applications that consume from and produce to Kafka.
https://getsentry.github.io/arroyo/
Apache License 2.0
39 stars 6 forks source link

Record consumer / client ID on metrics #357

Open mwarkentin opened 2 months ago

mwarkentin commented 2 months ago

We currently have a bunch of metrics in the arroyo namespace, however they are only tagged with VM / host information, rather than consumer, so it is hard (or impossible) to breakdown metrics properly per consumer.

One example of this would be to try to determine if partitions are balanced evenly across consumers. We have a metric called arroyo.consumer.partitions_assigned.count however we can only break it down by instance-id and similar tags which may include partitions assigned to multiple consumers running on the same node:

image

Ideally we could break this down by consumer and see if they all have the same number of partitions assigned.

untitaker commented 1 month ago

the min_partition tag is a similar case. we want a global tag for the entire consumer, but arroyo itself does not support that. so we had to implement it in the application.

in this particular case of checking for partition balance, I think it is sufficient to check that min == max == avg == p50 == p70 == ... of the metric, the tag breakdown is not needed IMO