cjmamo / kafka-web-console

A web console for Apache Kafka (retired)
Apache License 2.0
762 stars 246 forks source link

Play service crashing after extended period with window open. #38

Closed ibanner56 closed 9 years ago

ibanner56 commented 9 years ago

Service crashes with the following stack strace:

Uncaught error from thread [play-akka.actor.default-dispatcher-800] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[play]
java.lang.NoClassDefFoundError: common/Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1$$anonfun$applyOrElse$1
    at common.Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1.applyOrElse(Util.scala:82)
    at common.Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1.applyOrElse(Util.scala:81)
    at scala.runtime.AbstractPartialFunction$mcJL$sp.apply$mcJL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcJL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcJL$sp.apply(AbstractPartialFunction.scala:25)
    at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
    at scala.util.Try$.apply(Try.scala:161)
    at scala.util.Failure.recover(Try.scala:185)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:387)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:387)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:29)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
    at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: common.Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1$$anonfun$applyOrElse$1
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 23 more
Caused by: java.io.FileNotFoundException: /home/ubuntu/app/kafka-web-console/target/scala-2.10/classes/common/Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1$$anonfun$applyOrElse$1.class (Too many open files)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:146)
    at sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1086)
    at sun.misc.Resource.cachedInputStream(Resource.java:77)
    at sun.misc.Resource.getByteBuffer(Resource.java:160)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:436)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    ... 29 more

Here is the function it seems to be failing in, as mentioned at the top of the stack trace (from Util.scala):

  def getPartitionsLogSize(topicName: String, partitionLeaders: Seq[String]): Future[Seq[Long]] = {
    Logger.debug("Getting partition log sizes for topic " + topicName + " from partition leaders " + partitionLeaders.mkString(", "))
    return for {
      clients <- Future.sequence(partitionLeaders.map(addr => Future((addr, Kafka.newRichClient(addr)))))
      partitionsLogSize <- Future.sequence(clients.zipWithIndex.map { tu =>
        val addr = tu._1._1
        val client = tu._1._2
        var offset = Future(0L)
        if (!addr.isEmpty) {
          offset = twitterToScalaFuture(client.offset(topicName, tu._2, OffsetRequest.LatestTime)).map(_.offsets.head).recover {
            case e => Logger.warn("Could not connect to partition leader " + addr + ". Error message: " + e.getMessage); 0L
          }
        }

        client.close()
        offset
      })
    } yield partitionsLogSize
  }
ibanner56 commented 9 years ago

Note that the line numbers done correspond to the master branch, since we added a few imports for other tasks. The specific lines according to the stack trace for our version are:

        if (!addr.isEmpty) {
          offset = twitterToScalaFuture(client.offset(topicName, tu._2, OffsetRequest.LatestTime)).map(_.offsets.head).recover {
            case e => Logger.warn("Could not connect to partition leader " + addr + ". Error message: " + e.getMessage); 0L
          }
        }
ibanner56 commented 9 years ago

Apparently this is due to the ulimit being too low. 1024 is too small. We're going to boost out ulimit to the max for our server and see if it resolves the issue.

ibanner56 commented 9 years ago

Seems to have extended the length of time we can run before the issue reappears, however the issue still persists.

guihaojin commented 9 years ago

I got the same error.

guihaojin commented 9 years ago

Looks like the web-console is leaking socket. I saw the number of TCP connections with Kafka brokers keep growing as the server runs. Not sure if it's problem of the web-console or my Kafka/Zookeepers.

joelsvensson commented 9 years ago

I have the same problem with a 3node zookeeper/kafka cluster

For each: "[debug] application - Getting partition log sizes for topic test from partition leaders" the number of established sessions towards kafka grows by 15:

netstat -an | grep ESTA | grep 9092 | wc -l 15 netstat -an | grep ESTA | grep 9092 | wc -l 30 netstat -an | grep ESTA | grep 9092 | wc -l 45 netstat -an | grep ESTA | grep 9092 | wc -l 60 netstat -an | grep ESTA | grep 9092 | wc -l 75

This is from a fresh restart without browsing the GUI

ibanner56 commented 9 years ago

This is just another version of #30.