braedon / prometheus-kafka-consumer-group-exporter

Prometheus Kafka Consumer Group Exporter
MIT License
73 stars 39 forks source link

Incorrect metrics during startup when using `-s` flag #19

Open jutley opened 6 years ago

jutley commented 6 years ago

When using the flag to start from the beginning of the __consumer_offsets topic, metrics reported are based on old values until the exporter has reached the most recent commits. This has caused confusion for developers in our org, as it looks like their consumers are suddenly lagging.

I'd like to propose that when reading from the beginning of __consumer_offsets, consumer_group metrics do not get reported until ONE OF the following conditions is true for ALL __consumer_offsets partitions:

I'd also like to propose a health endpoint that provides the status of this warmup phase. This can help us make sure we only tear down an old container once a new container is providing the correct metrics.

These changes would favor less information over inaccurate information, which I think is beneficial in almost all cases.

If you agree this is a good direction, I'd be happy to take a stab at implementing it.

braedon commented 5 years ago

Hi @jutley, happy to review a PR around this.

I'd suggest checking if the exporter has consumed up to the high water mark observed during startup - timestamps can be finicky (particularly if a consumer group isn't actively committing), and while the exporter needs to be able to "keep up", it never strictly needs to reach a lag of 0.