clingen-data-model / architecture

2 stars 0 forks source link

Increase monitoring and visibility of kafka/confluent #80

Open sjahl opened 3 years ago

sjahl commented 3 years ago

I think it's desireable from a cost projection and performance perspective to spend some time figuring out how to monitor and track metrics for our kafka streams in confluent.

Best case scenario is probably to see if we can get the metrics from confluent plumbed into our GCP monitoring account, so that we can dashboard and alert from there with the rest of our app metrics. Next best, is to turn on any monitoring and alerting capabilities that we have in confluent to ensure that we're aware of cost problems and performance issues without manually checking a dashboard.

theferrit32 commented 3 years ago

@sjahl the ccloud CLI can be used to list the clusters we have, list the topics+partitions in the clusters, and then a micro python/clojure program can be used to find info for each topic, like how many messages there are, what is the timestamp of the first and last message, and other stuff we might want to know that's not readily available in the Confluent UI or client. Maybe do some sampling from the topic and estimate the average message size. Could make it a kubernetes job to run once a day or something.

sjahl commented 3 years ago

@theferrit32 Thanks! I'll take a look.

Confluent does have an API for metrics, which might be easier to work with, depending on what format the ccloud cli is outputting metrics in: https://docs.confluent.io/cloud/current/monitoring/metrics-api.html

I also found this: https://github.com/Dabz/ccloudexporter, which exposes the metrics on an HTTP api appropriate for Prometheus to scrape (which I think can use google monitoring as long term storage for the metrics it collects). Prometheus is something that I'm considering deploying anyway for other reasons, so this might be the way to go if that ends up being the case.