Open solsson opened 5 years ago
Maybe we should use https://github.com/kubernetes/kubernetes/pull/63742 for the broker
service. The current readiness probe on broker containers (any response to tcp port 9092) doesn't add any actual health checking.
{"level":"info","ts":1553708082.5494027,"msg":"Recv loop terminated: err=read tcp 10.0.23.17:58414->10.3.255.250:2181: i/o timeout","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553708082.5494552,"msg":"Send loop terminated: err=<nil>","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553708083.6529753,"msg":"Connected to 10.3.255.250:2181","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553708083.6588173,"msg":"Authenticated: id=245911530986602516, timeout=6000","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553708083.6598747,"msg":"Re-submitting `0` credentials after reconnect","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553709947.9208763,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"local","consumer":"site-vcc-qa-kkv-userstate-5cf9d9d9d8-tpdn4-20190319t203919","showall":true}
{"level":"info","ts":1553710578.1879478,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"local","consumer":"site-vcc-qa-kkv-userstate-5cf9d9d9d8-tpdn4-20190320t180610","showall":true}
{"level":"info","ts":1553712227.163901,"msg":"Recv loop terminated: err=read tcp 10.0.23.17:46212->10.3.255.250:2181: i/o timeout","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553712227.1639652,"msg":"Send loop terminated: err=<nil>","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553712256.1126459,"msg":"Shutdown triggered","type":"main","name":"burrow"}
{"level":"info","ts":1553712256.112681,"msg":"stopping","type":"coordinator","name":"consumer"}
{"level":"info","ts":1553712256.1126876,"msg":"stopping","type":"module","coordinator":"consumer","class":"kafka","name":"local"}
{"level":"info","ts":1553712258.7277691,"msg":"stopping","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1553712258.7414002,"msg":"Recv loop terminated: err=EOF","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1553712258.7414432,"msg":"Send loop terminated: err=<nil>","type":"module","coordinator":"consumer","class":"kafka_zk","name":"local_zk"}
{"level":"info","ts":1553712258.7414908,"msg":"stopping","type":"coordinator","name":"cluster"}
{"level":"info","ts":1553712258.7414985,"msg":"stopping","type":"module","coordinator":"cluster","class":"kafka","name":"local"}
{"level":"info","ts":1553712259.6486397,"msg":"Connected to 10.3.255.250:2181","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553712259.6560361,"msg":"Authentication failed: zk: session has been expired by the server","type":"coordinator","name":"zookeeper"}
{"level":"error","ts":1553712259.6560826,"msg":"session expired","type":"coordinator","name":"zookeeper"}
{"level":"info","ts":1553712259.656111,"msg":"stopping evaluations","type":"coordinator","name":"notifier"}
{"level":"info","ts":1553712259.6727962,"msg":"stopping","type":"coordinator","name":"notifier"}
{"level":"info","ts":1553712259.672839,"msg":"shutdown","type":"coordinator","name":"httpserver"}
{"level":"info","ts":1553712259.6729908,"msg":"stopping","type":"coordinator","name":"evaluator"}
{"level":"info","ts":1553712259.6730008,"msg":"stopping","type":"module","coordinator":"evaluator","class":"caching","name":"default"}
{"level":"info","ts":1553712259.6730094,"msg":"stopping","type":"coordinator","name":"storage"}
{"level":"info","ts":1553712259.6730392,"msg":"stopping","type":"module","coordinator":"storage","class":"inmemory","name":"default"}
{"level":"info","ts":1553712259.6731503,"msg":"stopping","type":"coordinator","name":"zookeeper"}
Stopped Burrow at March 27, 2019 at 6:44pm (UTC)
@solsson I am bit hijacking this issue, but since we had some challenges with Burrow, I wrote an own exporter for Kafka consumer group lags, which works similiarly to Burrow. The benefits however are:
I am actively developing it, and I plan to maintain it for a long time and thus I am happy to give support if you face any issues. Before writing articles about it / making it more public, I'd like to get more feedback about it. Are you interested in giving it a spin?
@weeco That's a most welcome initiative! I'm all for such ambitions hijacking of issues :) Yes I'm/we're interested in giving it a spin. Is there a docker image and yamls to start from? If not it sounds easy to set up so I could probably create the PR.
Sure we build docker containers using quay (it creates a docker tag for each release and builds "latest" every time we push something onmaster): https://quay.io/repository/google-cloud-tools/kafka-minion?tab=tags .
docker pull quay.io/google-cloud-tools/kafka-minion:v0.1.1
Regarding deployment yamls: I am still working on Helm charts. They are missing some environment variables (primarily how to mount kafka secrets): https://github.com/google-cloud-tools/kafka-minion-helm-chart , but they can give you a start to write YAMLs. All configuration can be done via environment variables, and for all environment variables there is a table in Kafka Minion's readme.
Looking forward to your feedback :).
I'm logging this issue because there shouldn't be a relation between burrow and the JMX exporter.
To reproduce:
1/2
ready.It's noteworthy that Burrow is configured to access brokers through headless service
broker
name resolution. That differs from the typical bootstrap process that kafka clients will do. However bootstrap might also be affected, in particular if all metrics pods get oomkilled at the same time. I was unaware until I read the librdkafka 1.0.0 release notes that bootstrap is a persistent connection.