Closed andreialecu closed 4 years ago
This is persistent and I can reproduce it every time.
Steps:
1) Reset the kubernetes cluster
2) Install openwhisk chart via helm
3) Verify everything runs properly (helm test owdev --cleanup
succeeds)
3) Restart docker
Here's an alternative/similar log:
[2019-02-25 13:05:35,506] INFO Loading logs. (kafka.log.LogManager)
[2019-02-25 13:05:35,577] ERROR Could not find offset index file corresponding to log file /kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000000.log, rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,660] INFO Recovering unflushed segment 0 in log cacheInvalidation-0. (kafka.log.Log)
[2019-02-25 13:05:35,697] INFO Loading producer state from offset 84 for partition cacheInvalidation-0 with message format version 2 (kafka.log.Log)
[2019-02-25 13:05:35,702] INFO Loading producer state from snapshot file '/kafka/kafka-logs-owdev-kafka-0/cacheInvalidation-0/00000000000000000084.snapshot' for partition cacheInvalidation-0 (kafka.log.ProducerStateManager)
[2019-02-25 13:05:35,710] INFO Completed load of log cacheInvalidation-0 with 1 log segments, log start offset 0 and log end offset 84 in 162 ms (kafka.log.Log)
[2019-02-25 13:05:35,724] ERROR Could not find offset index file corresponding to log file /kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000000.log, rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,758] INFO Recovering unflushed segment 0 in log completed0-0. (kafka.log.Log)
[2019-02-25 13:05:35,788] INFO Loading producer state from offset 16 for partition completed0-0 with message format version 2 (kafka.log.Log)
[2019-02-25 13:05:35,791] INFO Loading producer state from snapshot file '/kafka/kafka-logs-owdev-kafka-0/completed0-0/00000000000000000016.snapshot' for partition completed0-0 (kafka.log.ProducerStateManager)
[2019-02-25 13:05:35,793] INFO Completed load of log completed0-0 with 1 log segments, log start offset 0 and log end offset 16 in 76 ms (kafka.log.Log)
[2019-02-25 13:05:35,804] ERROR Could not find offset index file corresponding to log file /kafka/kafka-logs-owdev-kafka-0/events-0/00000000000000000000.log, rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,813] INFO Recovering unflushed segment 0 in log events-0. (kafka.log.Log)
[2019-02-25 13:05:35,825] INFO Loading producer state from offset 0 for partition events-0 with message format version 2 (kafka.log.Log)
[2019-02-25 13:05:35,829] INFO Completed load of log events-0 with 1 log segments, log start offset 0 and log end offset 0 in 33 ms (kafka.log.Log)
[2019-02-25 13:05:35,848] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.timeindex, /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index, and /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.txnindex and rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,855] ERROR There was an error in one of the threads during logs loading: java.io.FileNotFoundException: /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index (No such file or directory) (kafka.log.LogManager)
[2019-02-25 13:05:35,859] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.io.FileNotFoundException: /kafka/kafka-logs-owdev-kafka-0/health-0/00000000000000000000.index (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:106)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
at kafka.log.AbstractIndex.resize(AbstractIndex.scala:105)
at kafka.log.LogSegment.recover(LogSegment.scala:256)
at kafka.log.Log.recoverSegment(Log.scala:342)
at kafka.log.Log.$anonfun$loadSegmentFiles$3(Log.scala:321)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:789)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:788)
at kafka.log.Log.loadSegmentFiles(Log.scala:279)
at kafka.log.Log.loadSegments(Log.scala:383)
at kafka.log.Log.<init>(Log.scala:186)
at kafka.log.Log$.apply(Log.scala:1610)
at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:172)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-02-25 13:05:35,867] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
[2019-02-25 13:05:35,877] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2019-02-25 13:05:35,878] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.timeindex, /kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.index, and /kafka/kafka-logs-owdev-kafka-0/invoker0-0/00000000000000000000.txnindex and rebuilding index... (kafka.log.Log)
[2019-02-25 13:05:35,887] INFO Session: 0x10000012f850001 closed (org.apache.zookeeper.ZooKeeper)
[2019-02-25 13:05:35,888] INFO EventThread shut down for session: 0x10000012f850001 (org.apache.zookeeper.ClientCnxn)
[2019-02-25 13:05:35,894] INFO [Kafka Server 0], shut down completed (kafka.server.KafkaServer)
[2019-02-25 13:05:35,895] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-02-25 13:05:35,898] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
I'm able to work around it by disabling persistence in mycluster.yml
. I assume this is fine for a development environment.
k8s:
persistence:
enabled: false
Edit: No dice. Actually, that makes kafka start properly but now other pods are stuck in Init
state:
Is the cluster supposed to be able to start properly again after a docker restart with persistence disabled? Or should it always be reinstalled on docker restarts?
You can delete the kafka data to let it start from scratch.
Delete %USERPROFILE%\.docker\Volumes\owdev-kafka-pvc\<some-pvc-id>\kafka-logs-owdev-kafka-0
I think it is related to kafka not shutting down properly, but I don't know yet how to do it the correct way. I had some success by doing kubectl drain docker-for-desktop --ignore-daemonsets --delete-local-data
. Then I 'restart' by doing kubectl uncordon docker-for-desktop
.
But this didn't seem to work sometimes after a reboot of the computer, maybe kubernetes system items don't spawn correctly then?
If persistence is enabled (k8s.persistence.enabled=true
), then the expectation is that a deployed OpenWhisk system should be able to continue operation across "clean" restarts of Docker and/or the host machine. If Docker or the host machine crash, it is less certain that the system will come back up cleanly (but it has survived across crashes for me when OW was idle when the host machine was power cycled).
Using Docker Desktop (Windows). Openwhisk is installed via helm, and seems to work properly immediately after installation with all pods coming up properly.
However, after a reboot or restart of docker, things no longer work:
The logs for the kafka pod show this:
What could be wrong?