apache / openwhisk-deploy-kube

The Apache OpenWhisk Kubernetes Deployment repository supports deploying the Apache OpenWhisk system on Kubernetes and OpenShift clusters.
https://openwhisk.apache.org/
Apache License 2.0
300 stars 231 forks source link

Pods not starting properly after restarting Docker #435

Closed andreialecu closed 4 years ago

andreialecu commented 5 years ago

Using Docker Desktop (Windows). Openwhisk is installed via helm, and seems to work properly immediately after installation with all pods coming up properly.

However, after a reboot or restart of docker, things no longer work:

image

The logs for the kafka pod show this:

$ kubectl logs owdev-kafka-0 -n openwhisk
waiting for kafka to be ready
[2019-02-25 12:22:51,033] INFO KafkaConfig values:
        advertised.host.name = null
        advertised.listeners = PLAINTEXT://owdev-kafka:9092
        advertised.port = null
        alter.config.policy.class.name = null
        authorizer.class.name =
        auto.create.topics.enable = true
        auto.leader.rebalance.enable = true
        background.threads = 10
        broker.id = 0
        broker.id.generation.enable = true
        broker.rack = null
        compression.type = producer
        connections.max.idle.ms = 600000
        controlled.shutdown.enable = true
        controlled.shutdown.max.retries = 3
        controlled.shutdown.retry.backoff.ms = 5000
        controller.socket.timeout.ms = 30000
        create.topic.policy.class.name = null
        default.replication.factor = 1
        delete.records.purgatory.purge.interval.requests = 1
        delete.topic.enable = false
        fetch.purgatory.purge.interval.requests = 1000
        group.initial.rebalance.delay.ms = 0
        group.max.session.timeout.ms = 300000
        group.min.session.timeout.ms = 6000
        host.name =
        inter.broker.listener.name = null
        inter.broker.protocol.version = 0.11.0-IV2
        leader.imbalance.check.interval.seconds = 300
        leader.imbalance.per.broker.percentage = 10
        listener.security.protocol.map = SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,TRACE:TRACE,SASL_SSL:SASL_SSL,PLAINTEXT:PLAINTEXT
        listeners = PLAINTEXT://10.1.2.188:9092
        log.cleaner.backoff.ms = 15000
        log.cleaner.dedupe.buffer.size = 134217728
        log.cleaner.delete.retention.ms = 86400000
        log.cleaner.enable = true
        log.cleaner.io.buffer.load.factor = 0.9
        log.cleaner.io.buffer.size = 524288
        log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
        log.cleaner.min.cleanable.ratio = 0.5
        log.cleaner.min.compaction.lag.ms = 0
        log.cleaner.threads = 1
        log.cleanup.policy = [delete]
        log.dir = /tmp/kafka-logs
        log.dirs = /kafka/kafka-logs-owdev-kafka-0
        log.flush.interval.messages = 9223372036854775807
        log.flush.interval.ms = null
        log.flush.offset.checkpoint.interval.ms = 60000
        log.flush.scheduler.interval.ms = 9223372036854775807
        log.flush.start.offset.checkpoint.interval.ms = 60000
        log.index.interval.bytes = 4096
        log.index.size.max.bytes = 10485760
        log.message.format.version = 0.11.0-IV2
        log.message.timestamp.difference.max.ms = 9223372036854775807
        log.message.timestamp.type = CreateTime
        log.preallocate = false
        log.retention.bytes = -1
        log.retention.check.interval.ms = 300000
        log.retention.hours = 168
        log.retention.minutes = null
        log.retention.ms = null
        log.roll.hours = 168
        log.roll.jitter.hours = 0
        log.roll.jitter.ms = null
        log.roll.ms = null
        log.segment.bytes = 1073741824
        log.segment.delete.delay.ms = 60000
        max.connections.per.ip = 2147483647
        max.connections.per.ip.overrides =
        message.max.bytes = 1000012
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        min.insync.replicas = 1
        num.io.threads = 8
        num.network.threads = 3
        num.partitions = 1
        num.recovery.threads.per.data.dir = 1
        num.replica.fetchers = 1
        offset.metadata.max.bytes = 4096
        offsets.commit.required.acks = -1
        offsets.commit.timeout.ms = 5000
        offsets.load.buffer.size = 5242880
        offsets.retention.check.interval.ms = 600000
        offsets.retention.minutes = 1440
        offsets.topic.compression.codec = 0
        offsets.topic.num.partitions = 50
        offsets.topic.replication.factor = 1
        offsets.topic.segment.bytes = 104857600
        port = 9092
        principal.builder.class = class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
        producer.purgatory.purge.interval.requests = 1000
        queued.max.requests = 500
        quota.consumer.default = 9223372036854775807
        quota.producer.default = 9223372036854775807
        quota.window.num = 11
        quota.window.size.seconds = 1
        replica.fetch.backoff.ms = 1000
        replica.fetch.max.bytes = 1048576
        replica.fetch.min.bytes = 1
        replica.fetch.response.max.bytes = 10485760
        replica.fetch.wait.max.ms = 500
        replica.high.watermark.checkpoint.interval.ms = 5000
        replica.lag.time.max.ms = 10000
        replica.socket.receive.buffer.bytes = 65536
        replica.socket.timeout.ms = 30000
        replication.quota.window.num = 11
        replication.quota.window.size.seconds = 1
        request.timeout.ms = 30000
        reserved.broker.max.id = 1000
        sasl.enabled.mechanisms = [GSSAPI]
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.principal.to.local.rules = [DEFAULT]
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.mechanism.inter.broker.protocol = GSSAPI
        security.inter.broker.protocol = PLAINTEXT
        socket.receive.buffer.bytes = 102400
        socket.request.max.bytes = 104857600
        socket.send.buffer.bytes = 102400
        ssl.cipher.suites = null
        ssl.client.auth = none
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
        transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
        transaction.max.timeout.ms = 900000
        transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
        transaction.state.log.load.buffer.size = 5242880
        transaction.state.log.min.isr = 1
        transaction.state.log.num.partitions = 50
        transaction.state.log.replication.factor = 1
        transaction.state.log.segment.bytes = 104857600
        transactional.id.expiration.ms = 604800000
        unclean.leader.election.enable = false
        zookeeper.connect = owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local:2181
        zookeeper.connection.timeout.ms = 6000
        zookeeper.session.timeout.ms = 6000
        zookeeper.set.acl = false
        zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2019-02-25 12:22:51,080] INFO starting (kafka.server.KafkaServer)
[2019-02-25 12:22:51,081] INFO Connecting to zookeeper on owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local:2181 (kafka.server.KafkaServer)
[2019-02-25 12:22:51,090] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2019-02-25 12:22:51,093] INFO Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:host.name=owdev-kafka-0.owdev-kafka.openwhisk.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:java.version=1.8.0_144 (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:java.home=/opt/jdk1.8.0_144/jre (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:java.class.path=:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b05.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.5.jar:/opt/kafka/bin/../libs/connect-api-0.11.0.1.jar:/opt/kafka/bin/../libs/connect-file-0.11.0.1.jar:/opt/kafka/bin/../libs/connect-json-0.11.0.1.jar:/opt/kafka/bin/../libs/connect-runtime-0.11.0.1.jar:/opt/kafka/bin/../libs/connect-transforms-0.11.0.1.jar:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b05.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b05.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b05.jar:/opt/kafka/bin/../libs/jackson-annotations-2.8.5.jar:/opt/kafka/bin/../libs/jackson-core-2.8.5.jar:/opt/kafka/bin/../libs/jackson-databind-2.8.5.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.8.5.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.8.5.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.8.5.jar:/opt/kafka/bin/../libs/javassist-3.21.0-GA.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b05.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.0.1.jar:/opt/kafka/bin/../libs/jersey-client-2.24.jar:/opt/kafka/bin/../libs/jersey-common-2.24.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.24.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.24.jar:/opt/kafka/bin/../libs/jersey-guava-2.24.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.24.jar:/opt/kafka/bin/../libs/jersey-server-2.24.jar:/opt/kafka/bin/../libs/jetty-continuation-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-http-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-io-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-security-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-server-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-servlet-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-servlets-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-util-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.3.jar:/opt/kafka/bin/../libs/kafka-clients-0.11.0.1.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-0.11.0.1.jar:/opt/kafka/bin/../libs/kafka-streams-0.11.0.1.jar:/opt/kafka/bin/../libs/kafka-streams-examples-0.11.0.1.jar:/opt/kafka/bin/../libs/kafka-tools-0.11.0.1.jar:/opt/kafka/bin/../libs/kafka_2.12-0.11.0.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-0.11.0.1-test-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-0.11.0.1.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-1.3.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.5.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.0.24.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.0.1.jar:/opt/kafka/bin/../libs/scala-library-2.12.2.jar:/opt/kafka/bin/../libs/scala-parser-combinators_2.12-1.0.4.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.2.6.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.10.jar:/opt/kafka/bin/../libs/zookeeper-3.4.10.jar (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,093] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,094] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,094] INFO Client environment:os.version=4.9.125-linuxkit (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,094] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,094] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,094] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,094] INFO Initiating client connection, connectString=owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@6ac13091 (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,102] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
[2019-02-25 12:22:51,103] INFO Opening socket connection to server owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local/10.1.2.191:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-02-25 12:22:51,107] INFO Socket connection established to owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local/10.1.2.191:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-02-25 12:22:51,117] INFO Session establishment complete on server owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local/10.1.2.191:2181, sessionid = 0x100000125d10008, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2019-02-25 12:22:51,118] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2019-02-25 12:22:51,238] INFO Cluster ID = WWSB7slwSj-c2cj-CG7UfA (kafka.server.KafkaServer)
[2019-02-25 12:22:51,269] INFO [ThrottledRequestReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledRequestReaper)
[2019-02-25 12:22:51,269] INFO [ThrottledRequestReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledRequestReaper)
[2019-02-25 12:22:51,270] INFO [ThrottledRequestReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledRequestReaper)
[2019-02-25 12:22:51,339] INFO Loading logs. (kafka.log.LogManager)
[2019-02-25 12:22:51,383] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.timeindex, /kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.index, and /kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.txnindex and rebuilding index... (kafka.log.Log)
[2019-02-25 12:22:51,403] ERROR There was an error in one of the threads during logs loading: java.io.FileNotFoundException: /kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.index (No such file or directory) (kafka.log.LogManager)
[2019-02-25 12:22:51,404] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.io.FileNotFoundException: /kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.index (No such file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:106)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
        at kafka.log.AbstractIndex.resize(AbstractIndex.scala:105)
        at kafka.log.LogSegment.recover(LogSegment.scala:256)
        at kafka.log.Log.recoverSegment(Log.scala:342)
        at kafka.log.Log.$anonfun$loadSegmentFiles$3(Log.scala:321)
        at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:789)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:788)
        at kafka.log.Log.loadSegmentFiles(Log.scala:279)
        at kafka.log.Log.loadSegments(Log.scala:383)
        at kafka.log.Log.<init>(Log.scala:186)
        at kafka.log.Log$.apply(Log.scala:1610)
        at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:172)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2019-02-25 12:22:51,412] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
[2019-02-25 12:22:51,422] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2019-02-25 12:22:51,429] INFO EventThread shut down for session: 0x100000125d10008 (org.apache.zookeeper.ClientCnxn)
[2019-02-25 12:22:51,429] INFO Session: 0x100000125d10008 closed (org.apache.zookeeper.ZooKeeper)
[2019-02-25 12:22:51,430] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000000.timeindex, /kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000000.index, and /kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000000.txnindex and rebuilding index... (kafka.log.Log)
[2019-02-25 12:22:51,434] INFO [Kafka Server 0], shut down completed (kafka.server.KafkaServer)
[2019-02-25 12:22:51,434] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-02-25 12:22:51,436] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)

What could be wrong?

andreialecu commented 5 years ago

This is persistent and I can reproduce it every time.

Steps: 1) Reset the kubernetes cluster 2) Install openwhisk chart via helm 3) Verify everything runs properly (helm test owdev --cleanup succeeds) 3) Restart docker

Here's an alternative/similar log:

[2019-02-25 13:05:35,506] INFO Loading logs. (kafka.log.LogManager)
[2019-02-25 13:05:35,577] ERROR Could not find offset index file corresponding to log file /kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.log, rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,660] INFO Recovering unflushed segment 0 in log cacheInvalidation-0. (kafka.log.Log)
[2019-02-25 13:05:35,697] INFO Loading producer state from offset 84 for partition cacheInvalidation-0 with message format version 2 (kafka.log.Log)
[2019-02-25 13:05:35,702] INFO Loading producer state from snapshot file '/kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000084.snapshot' for partition cacheInvalidation-0 (kafka.log.ProducerStateManager)
[2019-02-25 13:05:35,710] INFO Completed load of log cacheInvalidation-0 with 1 log segments, log start offset 0 and log end offset 84 in 162 ms (kafka.log.Log)
[2019-02-25 13:05:35,724] ERROR Could not find offset index file corresponding to log file /kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000000.log, rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,758] INFO Recovering unflushed segment 0 in log completed0-0. (kafka.log.Log)
[2019-02-25 13:05:35,788] INFO Loading producer state from offset 16 for partition completed0-0 with message format version 2 (kafka.log.Log)
[2019-02-25 13:05:35,791] INFO Loading producer state from snapshot file '/kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000016.snapshot' for partition completed0-0 (kafka.log.ProducerStateManager)
[2019-02-25 13:05:35,793] INFO Completed load of log completed0-0 with 1 log segments, log start offset 0 and log end offset 16 in 76 ms (kafka.log.Log)
[2019-02-25 13:05:35,804] ERROR Could not find offset index file corresponding to log file /kafka/kafka-logs-owdev-kafka-0/events-0/00000000000000000000.log, rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,813] INFO Recovering unflushed segment 0 in log events-0. (kafka.log.Log)
[2019-02-25 13:05:35,825] INFO Loading producer state from offset 0 for partition events-0 with message format version 2 (kafka.log.Log)
[2019-02-25 13:05:35,829] INFO Completed load of log events-0 with 1 log segments, log start offset 0 and log end offset 0 in 33 ms (kafka.log.Log)
[2019-02-25 13:05:35,848] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.timeindex, /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index, and /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.txnindex and rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,855] ERROR There was an error in one of the threads during logs loading: java.io.FileNotFoundException: /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index (No such file or directory) (kafka.log.LogManager)
[2019-02-25 13:05:35,859] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.io.FileNotFoundException: /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index (No such file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:106)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
        at kafka.log.AbstractIndex.resize(AbstractIndex.scala:105)
        at kafka.log.LogSegment.recover(LogSegment.scala:256)
        at kafka.log.Log.recoverSegment(Log.scala:342)
        at kafka.log.Log.$anonfun$loadSegmentFiles$3(Log.scala:321)
        at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:789)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:788)
        at kafka.log.Log.loadSegmentFiles(Log.scala:279)
        at kafka.log.Log.loadSegments(Log.scala:383)
        at kafka.log.Log.<init>(Log.scala:186)
        at kafka.log.Log$.apply(Log.scala:1610)
        at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:172)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2019-02-25 13:05:35,867] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
[2019-02-25 13:05:35,877] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2019-02-25 13:05:35,878] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.timeindex, /kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.index, and /kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.txnindex and rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,887] INFO Session: 0x10000012f850001 closed (org.apache.zookeeper.ZooKeeper)
[2019-02-25 13:05:35,888] INFO EventThread shut down for session: 0x10000012f850001 (org.apache.zookeeper.ClientCnxn)
[2019-02-25 13:05:35,894] INFO [Kafka Server 0], shut down completed (kafka.server.KafkaServer)
[2019-02-25 13:05:35,895] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-02-25 13:05:35,898] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
andreialecu commented 5 years ago

I'm able to work around it by disabling persistence in mycluster.yml. I assume this is fine for a development environment.

k8s:
  persistence:
    enabled: false

Edit: No dice. Actually, that makes kafka start properly but now other pods are stuck in Init state:

image

Is the cluster supposed to be able to start properly again after a docker restart with persistence disabled? Or should it always be reinstalled on docker restarts?

jlami commented 5 years ago

You can delete the kafka data to let it start from scratch.

Delete %USERPROFILE%\.docker\Volumes\owdev-kafka-pvc\<some-pvc-id>\kafka-logs-owdev-kafka-0

I think it is related to kafka not shutting down properly, but I don't know yet how to do it the correct way. I had some success by doing kubectl drain docker-for-desktop --ignore-daemonsets --delete-local-data. Then I 'restart' by doing kubectl uncordon docker-for-desktop.

But this didn't seem to work sometimes after a reboot of the computer, maybe kubernetes system items don't spawn correctly then?

dgrove-oss commented 5 years ago

If persistence is enabled (k8s.persistence.enabled=true), then the expectation is that a deployed OpenWhisk system should be able to continue operation across "clean" restarts of Docker and/or the host machine. If Docker or the host machine crash, it is less certain that the system will come back up cleanly (but it has survived across crashes for me when OW was idle when the host machine was power cycled).