Telefonica / prometheus-kafka-adapter

Use Kafka as a remote storage database for Prometheus (remote write only)
Apache License 2.0
364 stars 135 forks source link

complementary consumer, seeking feedback #55

Closed johntdyer closed 1 year ago

johntdyer commented 4 years ago

Hey guys, thank you so much for your work here, its proving to be very helpful for our efforts. That being said I am curious what you guys do w/ the data once its in Kafka ? In our case we're trying to use Kafka as a transport layer between our data centers. I am considering writing a consumer to complement this one. Essentially what I am thinking would do the inverse of this one , in which it would consume avro encoded events from Kafka, buffer then, and then flush via GPRC to a Prometheus server. Since you guys wrote the producer I am curious if you have any opinions on the complementary tool I suggested?

palmerabollo commented 4 years ago

Thanks, @johntdyer. We use kafka-connect as a consumer to dump the data from Kafkka to S3 and be able to do some analytics.

We only implemented the remote write because it's the only flow we needed. The tool you suggest to support the remote read makes perfect sense to me for some use cases such as the one you describe (move data between datacenters).

I don't have any experience coding the "remote read" part, so I can't tell how hard it would be 😢

johntdyer commented 4 years ago

@palmerabollo - Thank you for responding ! So i am not sure if we're on the same page, because AFAIK we wouldnt be using the remote_read API at all. Just incase I figured it might be a good idea to clarify my intentions to make sure I am not overlooking something big on my end. So for the tool I am talking about I think it may help to think of it as a forward proxy for remote_write which uses kafka as its transport and then on the remote end it is deserialized recompressed and pushed into Promethous ( Cortex maybe ) as if it were an HTTP request that originated from the source prom. Does that make sense ?

skupjoe commented 3 years ago

I am curious about this also. Is there any way to consume these metrics on the other end and provide a Prometheus endpoint that can be scraped?

johnseekins commented 3 years ago

I am curious about this also. Is there any way to consume these metrics on the other end and provide a Prometheus endpoint that can be scraped?

Providing a prometheus endpoint to scrape on the other side of Kafka is fairly fraught. You can't guarantee that data coming from Kafka will show on a consistent schedule, so you may get data overwritten (e.g. two points for the same series come in between scrapes, point 2 could potentially overwrite point 1) or data missed (caching the data in the endpoint can only be done for so long before the buffer becomes too much, causing a flush before the scrape). It's likely better to use a consumer that pushes data on the other side of Kafka. You could connect this up with a push gateway, but streaming data isn't what a push gateway is designed for.

brokenjacobs commented 2 years ago

I too am looking at using this for either Mimr or cortex on the receiving side, not with remote_read but as a remote_write proxy. I want to subscribe to a wildcard set of topics (from multiple prometheus instances relaying) and batch into a cortex/mimr instance with remote_write. Has anyone else done this?

brokenjacobs commented 2 years ago

I just wanted to add here if anyone doesn't know about it: https://github.com/open-telemetry/opentelemetry-collector-contrib

This project can do both sides of this use case.