confluentinc / cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
https://cnfl.io/getting-started-kafka-kubernetes
Apache License 2.0
788 stars 847 forks source link

Kafka-Rest JMX monitoring not works as expected. #277

Open Mitrofanov opened 5 years ago

Mitrofanov commented 5 years ago

Hi guys.

Just deployed kafka-rest-proxy, and found that prometheus JMX exporter not works as expected.

After digging i found the following:

In exporter's container log:

VM settings:
    Max. Heap Size (Estimated): 44.50M
    Ergonomics Machine Class: server
    Using VM: OpenJDK 64-Bit Server VM

Apr 24, 2019 7:48:15 AM io.prometheus.jmx.JmxCollector collect
SEVERE: JMX scrape failed: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: localhost; nested exception is:
    java.net.ConnectException: Connection refused (Connection refused)]
    at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
    at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
    at io.prometheus.jmx.JmxScraper.doScrape(JmxScraper.java:94)
    at io.prometheus.jmx.JmxCollector.collect(JmxCollector.java:456)
    at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:183)
    at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.<init>(CollectorRegistry.java:147)
    at io.prometheus.client.CollectorRegistry.filteredMetricFamilySamples(CollectorRegistry.java:134)
    at io.prometheus.client.exporter.HTTPServer$HTTPMetricHandler.handle(HTTPServer.java:60)
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
    at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
    at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
    at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: localhost; nested exception is:
    java.net.ConnectException: Connection refused (Connection refused)]
    at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:136)
    at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205)
    at javax.naming.InitialContext.lookup(InitialContext.java:417)
    at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1955)
    at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1922)
    at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)
    ... 16 more
Caused by: java.rmi.ConnectException: Connection refused to host: localhost; nested exception is:
    java.net.ConnectException: Connection refused (Connection refused)
    at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
    at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
    at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
    at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:338)
    at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:112)
    at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:132)
    ... 21 more

In kafka-rest startup logs:

[2019-04-24 08:18:51,399] WARN Property jmx.port is not valid (kafka.utils.VerifiableProperties)

I use the following versions:

kafka-rest: confluentinc/cp-kafka-rest:5.2.1
exporter: solsson/kafka-prometheus-jmx-exporter@sha256:6f82e2b0464f50da8104acd7363fb9b995001ddff77d248379f8788e78946143

Also checked kafka-rest docs, and looks like there is no jmx.port option anymore. Please advice

dadoeyad commented 5 years ago

Same issue here. It was working on my localhost on MiniKube but not working on AWS EKS.

leeogrady commented 5 years ago

Experiencing the same issue running on a AWS EKS Cluster. This is in schema-registry though. Using 5.2.1 or 5.2.2

The issue appears to be when dropping the Replicas (brokers) below 3. Even overriding the replication factors to 1 generates the same error logs. Work around is to have 3 replicas. Note anti-affinity baked into statefulset also.

sureshoao commented 5 years ago

Same issue here as well. Image:confluentinc/cp-kafka:5.2.1 Link: https://github.com/confluentinc/cp-helm-charts/issues/304

sureshoao commented 5 years ago

@Mitrofanov Are you able to fix the issue?

jayeshmahajan commented 5 years ago

I had that issue because one of the pod had wrong/incorrect storage class name that created stale entry or bad state and it got never deleted properly.

I deleted all the pods and had that healed on its own in correct sequence. zk, kafaka and rest of all.

lorenzozimolo commented 4 years ago

Same issue here. I found a mismatch between the deployment.yaml and the launch file in the docker image.

Changing in the deployment.yaml the variable name to KAFKAREST_JMX_PORT (without underscore between KAFKA and REST) fixed for me the problem. I don't know anyway if this is the right way to do it.

HungUnicorn commented 4 years ago

https://github.com/confluentinc/cp-docker-images/blob/5.3.1-post/debian/kafka-rest/include/etc/confluent/docker/launch#L33 uses JMX_PORT if KAFKAREST_JMX_PORT is not set, so should work

jgomez-jb commented 4 years ago

@dadoeyad Im just facing the same, did you get to fix it?