apache / openwhisk

Apache OpenWhisk is an open source serverless cloud platform
https://openwhisk.apache.org/
Apache License 2.0
6.56k stars 1.17k forks source link

ansible/openwhisk.yml fails for waiting kafka server started up #5032

Open celery1124 opened 4 years ago

celery1124 commented 4 years ago

Environment details:

Steps to reproduce the issue:

  1. cd tools/ubuntu-setup && ./all.sh
  2. ansible-playbook setup.yml ; ansible-playbook prereq.yml (with envrionment variable setup for couchDB)
  3. ./gradlew distDocker
  4. ansible-playbook initdb.yml ; ansible-playbook wipe.yml
  5. ansible-playbook openwhisk.yml

Provide the actual results and outputs:

TASK [kafka : wait until the kafka server started up] ***********************************************************************************************************
Tuesday 01 December 2020  14:03:16 -0600 (0:00:27.886)       0:00:49.298 ******
FAILED - RETRYING: wait until the kafka server started up (10 retries left).
FAILED - RETRYING: wait until the kafka server started up (9 retries left).
FAILED - RETRYING: wait until the kafka server started up (8 retries left).
FAILED - RETRYING: wait until the kafka server started up (7 retries left).
FAILED - RETRYING: wait until the kafka server started up (6 retries left).
FAILED - RETRYING: wait until the kafka server started up (5 retries left).
FAILED - RETRYING: wait until the kafka server started up (4 retries left).
FAILED - RETRYING: wait until the kafka server started up (3 retries left).
FAILED - RETRYING: wait until the kafka server started up (2 retries left).
FAILED - RETRYING: wait until the kafka server started up (1 retries left).
fatal: [kafka0]: FAILED! => {"attempts": 10, "changed": true, "cmd": "(echo dump; sleep 1) | nc 172.17.0.1 2181 | grep /brokers/ids/0", "delta": "0:00:01.005511", "end": "2020-12-01 14:04:20.335370", "msg": "non-zero return code", "rc": 1, "start": "2020-12-01 14:04:19.329859", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

[FAILED]
> (echo dump; sleep 1) | nc 172.17.0.1 2181 | grep /brokers/ids/0
non-zero return code

PLAY RECAP ******************************************************************************************************************************************************
kafka0                     : ok=9    changed=3    unreachable=0    failed=1

Additional information you deem important:

rabbah commented 3 years ago

any chance you're out of disk space? you can check the kafka logs - another reason is that kafka isn't able to reach zookeeper - which means networking issue. try sudo ifconfig lo0 alias 172.17.0.1/24.

aFuerst commented 3 years ago

I am getting this same error, and it seems to be a problem of kafka not being able to keep a stable connection to zookeeper. Using Ubuntu 16.01

Relevant kafka log section:

[2021-04-01 17:54:53,847] INFO Initiating client connection, connectString=172.17.0.1:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@7e0b85f9 (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:54:53,892] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:53,898] INFO Opening socket connection to server 172.17.0.1/172.17.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:54:59,896] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:59,902] WARN Client session timed out, have not heard from server in 6012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,009] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:55:00,012] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,014] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:55:00,019] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
    at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:258)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
    at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:254)
    at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:112)
    at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1826)
    at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:364)
    at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:387)
    at kafka.server.KafkaServer.startup(KafkaServer.scala:207)
    at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
    at kafka.Kafka$.main(Kafka.scala:84)
    at kafka.Kafka.main(Kafka.scala)
[2021-04-01 17:55:00,022] INFO shutting down (kafka.server.KafkaServer)
[2021-04-01 17:55:00,032] INFO shut down completed (kafka.server.KafkaServer)
[2021-04-01 17:55:00,034] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2021-04-01 17:55:00,039] INFO shutting down (kafka.server.KafkaServer)
celery1124 commented 3 years ago

I am getting this same error, and it seems to be a problem of kafka not being able to keep a stable connection to zookeeper. Using Ubuntu 16.01

Relevant kafka log section:

[2021-04-01 17:54:53,847] INFO Initiating client connection, connectString=172.17.0.1:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@7e0b85f9 (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:54:53,892] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:53,898] INFO Opening socket connection to server 172.17.0.1/172.17.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:54:59,896] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:59,902] WARN Client session timed out, have not heard from server in 6012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,009] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:55:00,012] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,014] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:55:00,019] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
  at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:258)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:254)
  at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:112)
  at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1826)
  at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:364)
  at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:387)
  at kafka.server.KafkaServer.startup(KafkaServer.scala:207)
  at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
  at kafka.Kafka$.main(Kafka.scala:84)
  at kafka.Kafka.main(Kafka.scala)
[2021-04-01 17:55:00,022] INFO shutting down (kafka.server.KafkaServer)
[2021-04-01 17:55:00,032] INFO shut down completed (kafka.server.KafkaServer)
[2021-04-01 17:55:00,034] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2021-04-01 17:55:00,039] INFO shutting down (kafka.server.KafkaServer)

I didn't dig much into this case since I found no issues on Ubuntu 18.04 (with the same scripts). Maybe you can try with a more up to date OS.

Mian

jemmy512 commented 2 years ago

any chance you're out of disk space? you can check the kafka logs - another reason is that kafka isn't able to reach zookeeper - which means networking issue. try sudo ifconfig lo0 alias 172.17.0.1/24.

@rabbah I have same issue.

I met this error when try to alia lo, do you know how to fix it?

:/$ sudo ifconfig lo alias 172.17.0.1/24
alias: Host name lookup failure
ifconfig: `--help' gives usage information.

OS: Ubuntu 22.04.1 LTS

ifconfig

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 18592  bytes 2881713 (2.8 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18592  bytes 2881713 (2.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
style95 commented 2 years ago

According to the logs, you guys need to check the sanity of zookeeper first. Is your zookeeper accessible from other containers?