cjmamo / kafka-web-console

A web console for Apache Kafka (retired)
Apache License 2.0
762 stars 246 forks source link

Count not connect to partition leader localhost:9092 (connection refused) #27

Open leabdalla opened 10 years ago

leabdalla commented 10 years ago

The web-console is working fine but it shows these errors each fetch interval:

[debug] application - Getting partition log sizes for topic Example from partition leaders localhost:9092, localhost:9092
[debug] application - Getting partition log sizes for topic AnotherExample from partition leaders localhost:9092, localhost:9092
[warn] application - Count not connect to partition leader localhost:9092. Error message: java.net.ConnectException: Connection refused: localhost/127.0.0.1:9092
[warn] application - Count not connect to partition leader localhost:9092. Error message: java.net.ConnectException: Connection refused: localhost/127.0.0.1:9092
[warn] application - Count not connect to partition leader localhost:9092. Error message: java.net.ConnectException: Connection refused: localhost/127.0.0.1:9092
[warn] application - Count not connect to partition leader localhost:9092. Error message: java.net.ConnectException: Connection refused: localhost/127.0.0.1:9092

I'm running zookeeper and kafka on the same machine. When I start kafka-server, the output is showing leader fine:

[2014-07-09 10:43:45,814] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$)
[2014-07-09 10:43:45,856] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
[2014-07-09 10:43:46,371] INFO New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2014-07-09 10:43:46,377] INFO Registered broker 0 at path /brokers/ids/0 with address localhost:9092. (kafka.utils.ZkUtils$)
[2014-07-09 10:43:46,400] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
[2014-07-09 10:43:46,910] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [Example,1],[AnotherExample,1],[AnotherExample,0],[Example,0] (kafka.server.ReplicaFetcherManager)
[2014-07-09 10:43:47,007] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [Example,1],[AnotherExample,1],[AnotherExample,0],[Example,0] (kafka.server.ReplicaFetcherManager)

Is this a problem with web-console app, or my kafka install?

cjmamo commented 10 years ago

Can you telnet to this address: localhost:9092?

ryanpersaud commented 9 years ago

I'm seeing the same error message and I can telnet from the machine that is running the web-console application to port 9092 of the kafka broker. See below (I masked the hostnames and IPs).

[warn] application - Could not connect to partition leader XXX. Error message: Failed to open a socket.

$ telnet XXX 9092 Trying Y.Y.Y.Y... Connected to XXX. Escape character is '^]'.

ibanner56 commented 9 years ago

If you're failing to open a socket, you may have too many open. What does netstat -an | grep ESTA | wc -l return?

If it's a big number, then this is the same issue as #30.

ryanpersaud commented 9 years ago

Seems to top out just below 4000 (due to your patch?): netstat -an | grep ESTA | wc -l 3992

ibanner56 commented 9 years ago

My fork only fixes part of the problem, unfortunately - I managed to keep it from maxing out on 9092 and from maxing out the number of open files, but it still maxes out on 9090 and 9091 for reasons I have yet to nail down. Eventually you will still find yourself unable to open a socket.

ryanpersaud commented 9 years ago

I should also add that when I click on a topic, the consumer group information never populates.

ibanner56 commented 9 years ago

Yeah, we ran into that too. Right now our accepted solution is just "eh, try refreshing it and waiting several seconds."