confluentinc / confluent-cli

Confluent Platform CLI
Other
60 stars 38 forks source link

Confluent CLI says stack is down, even if it's not #48

Open rmoff opened 7 years ago

rmoff commented 7 years ago
Robin@asgard02 ~> confluent status
connect is [DOWN]
kafka-rest is [DOWN]
schema-registry is [DOWN]
kafka is [DOWN]
zookeeper is [DOWN]

But it's clearly running:

Robin@asgard02 ~> ps -ef|grep confluent
  502  3300     1   0 Wed02pm ??         3:13.19 /usr/bin/java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* org.apache.zookeeper.server.quorum.QuorumPeerMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/zookeeper/zookeeper.properties
  502  3463     1   0 Wed02pm ??         4:33.09 /usr/bin/java -Xmx512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dschema-registry.log.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/schema-registry/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../package-schema-registry/target/kafka-schema-registry-package-*-development/share/java/schema-registry/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-common/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/rest-utils/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/schema-registry/* io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/schema-registry/schema-registry.properties
  502  3700     1   0 Wed02pm ??         2:39.24 /usr/bin/java -Xmx256M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka-rest/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../target/kafka-rest-*-development/share/java/kafka-rest/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-common/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/rest-utils/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka-rest/* io.confluent.kafkarest.KafkaRestMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/kafka-rest/kafka-rest.properties
  502  5926     1   0 Wed03pm ??       135:50.08 /usr/bin/java -Xmx256M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/connect-log4j.properties -cp /Users/Robin/cp/confluent-3.3.0/share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/share/java/confluent-common/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-serde-tools/*:/Users/Robin/cp/confluent-3.3.0/share/java/monitoring-interceptors/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-elasticsearch/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-hdfs/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-irc/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-jdbc/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-replicator/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-s3/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-storage-common/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-twitter/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* org.apache.kafka.connect.cli.ConnectDistributed /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/connect/connect.properties
  502 52893     1   0 Fri05pm ??        57:38.69 /usr/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* io.confluent.support.metrics.SupportedKafka /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/kafka/kafka.properties
  502 63772 63522   0 10:58pm ttys000    0:00.00 grep --color=auto confluent

This was after numerous days suspending/unsuspending my laptop, having previously started the stack up.

This issue causes two problems:

  1. Can't use the CLI to shutdown the running components
  2. Can't use the CLI to start up the stack, because it's running, and you get port clashes:
Robin@asgard02 ~> ps -ef|grep confluent
  502  3300     1   0 Wed02pm ??         3:13.45 /usr/bin/java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* org.apache.zookeeper.server.quorum.QuorumPeerMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/zookeeper/zookeeper.properties
  502 64006 63522   0 11:02pm ttys000    0:00.00 grep --color=auto confluent
Robin@asgard02 ~> confluent status
connect is [DOWN]
kafka-rest is [DOWN]
schema-registry is [DOWN]
kafka is [DOWN]
zookeeper is [DOWN]
Robin@asgard02 ~> confluent start
Starting zookeeper
Zookeeper failed to start
zookeeper is [DOWN]
Cannot start Kafka, Zookeeper is not running. Check your deployment

confluent log zookeeper shows:

[2017-10-03 23:02:27,339] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-10-03 23:02:27,340] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
java.net.BindException: Address already in use

I don't quite know how my setup got into the state it did, but the CLI needs to improve how it detects if processes are running or not.

prasanna-sk commented 6 years ago

We kind of ran into the same issue before and the only way we recovered was manually kill the processes listed in ps -ef command and start the stack again.

kkonstantine commented 6 years ago

When you are running confluent status does confluent current or echo $CONFLUENT_CURRENT (if it's set) point to the runtime directory of the deployment that is currently running?

If you've set CONFLUENT_CURRENT but you attempted to run confluent status from a terminal that doesn't have this env var set, the CLI doesn't have a way to find the descriptors for the currently running services. You might want to use lsof to figure out what that directory of the running services.

prasanna-sk commented 6 years ago

In my case - CONFLUENT_CURRENT is not set. But, from this link, if it is not set, it defaults to /tmp

confluent current - does show the runtime dir from /tmp

prasanna-sk commented 6 years ago

Here is an observation/issue we are facing.

root user -- confluent start . (successful) root user -- confluent status . (shows all services are UP) root user -- confluent current (shows /tmp/confluent.######)

non-root user log into the same server while the services are up and running.

non-root user -- confluent status (shows all services are DOWN) non-root user -- sudo confluent status (shows all services are UP) non-root user -- confluent current (shows same /tmp/confluent.###### as above).

What I did notice is by default - /tmp/confluent.###### has rwx------ permission for root (or any user that starts the service). So, no other users are unable to read that dir or files in it. confluent.current also has rwx------ permission - again owned and accessible only owner (in this case root).

Note: I did yum install confluent package as root. Not sure if that has any implication.

ganu453 commented 6 years ago

I am also facing the same issue with non root user but its fine for root user.

sankalp58 commented 6 years ago

I also faced the same issue, which means zookeeper is running from init.d so just sudo service zookeeper stop , try it , if it works then its relaxing.

rmoff commented 6 years ago

Hitting this issue again. Seems to be different terminal sessions end up with different CONFLUENT_CURRENT values, all based on permutations of /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.xxxxxxx

I'm definitely not doing anything to set CONFLUENT_CURRENT myself.

Having to wheel out this rather nasty way of killing things:

ps -ef|grep confluent.|grep -v grep|awk '{print $2}'|xargs -Ifoo kill -9 foo
ngwwm commented 6 years ago

I have the same problem. 'confluent status' return [DOWN], 'confluent stop', 'confluent log' doesn't work...

I just found that there are 2 confluent current running folders under /tmp. I checked that one of the folder is empty and one of them contains files of the current running Confluent instance. When I do a 'confluent current', it returns the name of the empty folder!!! I noticed that the file /tmp/confluent.current has something to do with the confluent cli. I updated the file to match with the current running kafka instance and 'confluent log kafka' now works again. But, confluent status still doesn't work...

ngwwm commented 6 years ago

To workaround the issue, always run the confluent cli from /tmp (or $CONFLUENT_CURRENT if defined) Or update bin/confluent as below

... [[ $# -lt 1 ]] && usage

requirements

cd $confluent_current_dir command="${1}" ...

I am using confluent 4.

gopinathankm commented 6 years ago

I encountered this issue, I tried following. It works! I am using confluent oss 5.0.0 Problem: user@user-Lenovo-G400:~$ confluent start This CLI is intended for development only, not for production https://docs.confluent.io/current/cli/index.html Using CONFLUENT_CURRENT: /home/user/confluent-5.0.0/confluent.0C1Oma4q Starting zookeeper Zookeeper failed to start zookeeper is [DOWN] Cannot start Kafka, Zookeeper is not running. Check your deployment

Solution: user@user-Lenovo-G400:~ sudo /home/user/confluent-5.0.0/bin/zookeeper-server-stop

user@user-Lenovo-G400:~$ confluent start This CLI is intended for development only, not for production https://docs.confluent.io/current/cli/index.html

Using CONFLUENT_CURRENT: /home/user/confluent-5.0.0/confluent.0C1Oma4q Starting zookeeper zookeeper is [UP] Starting kafka kafka is [UP] Starting schema-registry schema-registry is [UP] Starting kafka-rest kafka-rest is [UP] Starting connect connect is [UP] Starting ksql-server ksql-server is [UP] user@user-Lenovo-G400:~$

May be someone may find it useful!

alokpaul commented 5 years ago

Seems this issue is still there. confluent status does not seem to work.