Telefonica / prometheus-kafka-adapter

Use Kafka as a remote storage database for Prometheus (remote write only)
Apache License 2.0
364 stars 135 forks source link

Related data in kafka is lost #50

Closed fcddk closed 2 years ago

fcddk commented 4 years ago

log: {"level":"info","msg":"creating kafka producer","time":"2020-07-02T11:57:14Z"} {"fields.time":"2020-07-02T11:57:22Z","ip":"100.101.182.20","latency":5577772,"level":"info","method":"GET","msg":"","path":"/metrics","status":200,"time":"2020-07-02T11:57:22Z","user-agent":"Prometheus/2.15.2"} {"fields.time":"2020-07-02T11:57:48Z","ip":"10.110.19.102","latency":4830491,"level":"info","method":"GET","msg":"","path":"/metrics","status":200,"time":"2020-07-02T11:57:48Z","user-agent":"kube-probe/1.14"} {"fields.time":"2020-07-02T11:57:52Z","ip":"100.101.182.20","latency":4905090,"level":"info","method":"GET","msg":"","path":"/metrics","status":200,"time":"2020-07-02T11:57:52Z","user-agent":"Prometheus/2.15.2"} {"fields.time":"2020-07-02T11:58:06Z","ip":"10.110.19.102","latency":4937995,"level":"info","method":"GET","msg":"","path":"/metrics","status":200,"time":"2020-07-02T11:58:06Z","user-agent":"kube-probe/1.14"} {"fields.time":"2020-07-02T11:58:15Z","ip":"100.101.182.20","latency":2782974,"level":"info","method":"POST","msg":"","path":"/receive","status":200,"time":"2020-07-02T11:58:15Z","user-agent":"Prometheus/2.15.2"} {"fields.time":"2020-07-02T11:58:15Z","ip":"100.101.182.20","latency":5028590,"level":"info","method":"POST","msg":"","path":"/receive","status":200,"time":"2020-07-02T11:58:15Z","user-agent":"Prometheus/2.15.2"} {"fields.time":"2020-07-02T11:58:15Z","ip":"100.101.182.20","latency":7297165,"level":"info","method":"POST","msg":"","path":"/receive","status":200,"time":"2020-07-02T11:58:15Z","user-agent":"Prometheus/2.15.2"} {"fields.time":"2020-07-02T11:58:15Z","ip":"100.101.182.20","latency":4929546,"level":"info","method":"POST","msg":"","path":"/receive","status":200,"time":"2020-07-02T11:58:15Z","user-agent":"Prometheus/2.15.2"} {"fields.time":"2020-07-02T11:58:15Z","ip":"100.101.182.20","latency":3711972,"level":"info","method":"POST","msg":"","path":"/receive","status":200,"time":"2020-07-02T11:58:15Z","user-agent":"Prometheus/2.15.2"} {"fields.time":"2020-07-02T11:58:15Z","ip":"100.101.182.20","latency":3597851,"level":"info","method":"POST","msg":"","path":"/receive","status":200,"time":"2020-07-02T11:58:15Z","user-agent":"Prometheus/2.15.2"}

fcddk commented 4 years ago

kafka version: 2.4.0 prometheus-kafka-adapter version: 1.7.0 log: {"level":"info","msg":"creating kafka producer","time":"2020-07-02T11:57:14Z"} but I cannot find topic information in kafka, and kafka and prometheus-kafka-adapter have no error information

fcddk commented 4 years ago

Kafka deployment method: a single point of service deployed on a virtual machine.

palmerabollo commented 4 years ago

Hi @fcddk, could you please set LOG_LEVEL=debug and share the logs? You should see some timeseries. Note to myself: This open issue might be related https://github.com/Telefonica/prometheus-kafka-adapter/issues/49

palmerabollo commented 4 years ago

Ping @fcddk. Is this still an issue? Were you able to make it work?

fcddk commented 4 years ago

There are too many debug logs, but the same problem remains. Prometheus-kafka-adapter did not check whether kafka broker is available. I tested it and configured a kafka service address that does not exist at all, but I still can't see the error log.

palmerabollo commented 4 years ago

Ok, so the issue arises when kafka broker is not available for some reason. Thanks for the info. In our setup (kubernetes-based), we have an init container to make prometheus-kafka-adapter (PKA) wait until kafka is ready. This is why we don't see this issue.

How would you like it to work if no kafka broker is ready? I think there are two options:

  1. PKA restarts itself, kafka will eventually be ready (easier to implement I guess).
  2. PKA includes a retry loop when it tries to connect to kafka.

Or a mixed approach (e.g. 3 retries and then die).

huangyong1991 commented 4 years ago

I have the same problem , there are no errors in the log ,where is the data?

palmerabollo commented 4 years ago

https://github.com/Telefonica/prometheus-kafka-adapter/issues/52#issuecomment-663324632 provides useful logs to help us debug this issue.

prometheus-kafka-adapter shows data, but it is not sent to the kafka topic for some reason:

{"level":"debug","msg":"","time":"2020-07-24T10:33:47+08:00","var":{"timeseries":[{"labels":[{"name":"name","value":"clickhouse_delayed_inserts"},{"name":"instance","value":"10.132.35.20:19116"},{"name":"job","value":"clickhouse_exporter"}],"samples":[{"timestamp":1595558027127}]},{"labels":[{"name":"name","value":"clickhouse_table_parts_bytes"},{"name":"database","value":"system"},{"name":"instance","value":"10.132.35.20:19116"},{"name":"job","value":"clickhouse_exporter"},{"name":"table","value":"query_log"}],"samples":[{"value":10987,"timestamp":1595558027127}]}]}}

palmerabollo commented 4 years ago

@huangyong1991 we might have introduced a regression in our latest release 1.7.0. Which version are you using? Could you please try it with 1.6.0 instead to verify if it works as expected? It would help us find the bug.

huangyong1991 commented 4 years ago

@palmerabollo I use version 1.7.0

palmerabollo commented 4 years ago

@huangyong1991 Could you repeat your test with 1.6.0, please?

harold-kfuse commented 3 years ago

@palmerabollo I'm facing this same issue too. I used helm chart to deploy this and also I used the bitnami helm chart to deploy kafka on my k8s cluster. I have similar log as above but when I check kafka, it doesn't have anything.

harold-kfuse commented 3 years ago

I tested with 1.6.0 and it works. I see metrics in kafka.