cjmamo / kafka-web-console

A web console for Apache Kafka (retired)
Apache License 2.0
762 stars 246 forks source link

Kafka Web Console release v2.0.0 is creating a high number of open file handles (against Kafka 0.8.1.1, ZooKeeper 3.3.4) #47

Closed tonyfalabella closed 9 years ago

tonyfalabella commented 9 years ago

I'm running Kafka Web Console release v2.0.0 against Kafka 0.8.1.1 and ZooKeeper 3.3.4

I'm consistently seeing the number of open file handles increasing when I launch Kafka Web Console after navigating to a topic on Zookeeper. Once the file handles start to increase, they increase without any more navigation being done in the browser - meaning I only need to launch the web console and do nothing else beside monitor the number of open files and I'll see it increase every few seconds. I've confirmed there are no other producers or consumers connecting to Kafka or Zookeeper.

After this runs for a while you'll get either of these errors:

$INSTALLDIR/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --create --replication-factor 1 --partitions 4 --topic test2

You'll get an error like this:

Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 0; nested exception is:
    java.net.BindException: Address already in use
java.io.FileNotFoundException: /src1/fos/dev-team-tools/var/kafka/broker-0/replication-offset-checkpoint.tmp

The ulimit for the id that my Kafka process runs under has a very large value for the "open files".

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 610775
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 500000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 610775
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Note, I've also tried this Pull Request from @ibanner56 ( https://github.com/claudemamo/kafka-web-console/pull/40) which is related to these issues (https://github.com/claudemamo/kafka-web-console/issues/36 and https://github.com/claudemamo/kafka-web-console/issues/37 from @mungeol) but it did not fix the issue.

To reproduce on Linux do the following.

  1. Launch ZooKeeper
  2. Launch Kafka
  3. Create a topic with 4 partitions with 1 replication... $INSTALLDIR/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --create --replication-factor 1 --partitions 4 --topic test2
  4. Open a Putty session and run this script in that window
while [[ 1 == 1 ]]; do
  date
  echo "zookeeper: $(ls -ltr /proc/`ps -ef |grep zookeeper.server|grep -v grep|awk '{print $2}'`/fd |wc -l)"
  echo "Kafka: $(ls -ltr /proc/`ps -ef |grep kafka.Kafka     |grep -v grep|awk '{print $2}'`/fd |wc -l)"
  echo ""      
  sleep 5;
done
  1. Launch Kafka Web Console
  2. Browse to a topic
  3. Notice the number of "Kafka" connections in the Putty session should increase
  4. Wait several seconds. Notice the number of "Kafka" connections in the Putty session should increase again, without doing anything. Sample output from the script in #4 after running for a couple of hours (with 8 topics defined on the Zookeeper instance, 1 replication each, 4 partitions each).
Wed Jan 21 18:44:29 EST 2015
zookeeper: 37
Kafka: 6013

Wed Jan 21 18:44:34 EST 2015
zookeeper: 37
Kafka: 6013

Wed Jan 21 18:44:39 EST 2015
zookeeper: 37
Kafka: 6045

...

Wed Jan 21 18:51:23 EST 2015
zookeeper: 37
Kafka: 6461
gruaig commented 9 years ago

Its like the files are not being closed I too experience this issue. root@cerb ~ # sysctl fs.file-nr fs.file-nr = 27424 0 6552758 root@cerb~ # sysctl fs.file-nr fs.file-nr = 28864 0 6552758 root@cerb ~ # sysctl fs.file-nr fs.file-nr = 29600 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 29600 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 29600 0 6552758 root@cerb~ # sysctl fs.file-nr fs.file-nr = 29600 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 29760 0 6552758 root@cerb ~ # sysctl fs.file-nr fs.file-nr = 30272 0 6552758 root@cerb~ # sysctl fs.file-nr fs.file-nr = 30272 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 30272 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 30272 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 30272 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 30976 0 6552758 root@cerberus ~ # sysctl fs.file-nr fs.file-nr = 30976 0 6552758

core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 515011 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 60000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 515011 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

foovungle commented 9 years ago

I wrote up a stackoverflow issue http://stackoverflow.com/questions/28549868/kafka-web-console-using-twitter-finagle-not-responding before I found this thread. The only thing I could do was to restart the server. What have you been doing?

gruaig commented 9 years ago

Hey

What we have been doing is setting the number of open files on our system to the max. "65355". The application no longer crashes ..

Sean

On Mon, Feb 16, 2015 at 10:58 PM, Foo Lim notifications@github.com wrote:

I wrote up a stackoverflow issue http://stackoverflow.com/questions/28549868/kafka-web-console-using-twitter-finagle-not-responding before I found this thread. The only thing I could do was to restart the server. What have you been doing?

— Reply to this email directly or view it on GitHub https://github.com/claudemamo/kafka-web-console/issues/47#issuecomment-74585622 .

foovungle commented 9 years ago

Hi, I contemplated that as well. It's a good stop gap, but eventually, it'll hit the limit (faster if there are more partitions). I was looking for a more permanent solution, but this'll have to do for now, I guess. -F

gruaig commented 9 years ago

Yeah we have moved away and developed our own solution thats very similar to kafka-web console/

On Tue, Feb 17, 2015 at 9:01 AM, Foo Lim notifications@github.com wrote:

Hi, I contemplated that as well. It's a good stop gap, but eventually, it'll hit the limit (faster if there are more partitions). I was looking for a more permanent solution, but this'll have to do for now, I guess. -F

— Reply to this email directly or view it on GitHub https://github.com/claudemamo/kafka-web-console/issues/47#issuecomment-74636125 .

tonyfalabella commented 9 years ago

This is really a major issue. Not only does Kafka become unstable but it can reek havoic on any other process that needs to use ports when the "open files" limit has been reached. I've also observed instability even when that max has not been reached.

To fix the issue we used to kill web-console. I can't remember if we also then occassionally had to rebuild some of the topic files or not.

You'll also notice a ton of messages being generated in your zookeeper log file. The log file can quickly grow to be quite large.

Due to this issue we've stopped using kafka-web console and are also implementing our own solution. I love that @claudemamo created this and has offerred it to be used by others (it's a nice little GUI). Unfortunately I don't think the Kafka Wiki should suggest people consider using kafka-web-console until this issue is closed. It really makes Kafka (and possibly your entire server) unstable.

cjmamo commented 9 years ago

Duplicate of https://github.com/claudemamo/kafka-web-console/issues/30

gruaig commented 9 years ago

This isint a duplicate.

On Wed, Feb 18, 2015 at 7:43 AM, Claude Mamo notifications@github.com wrote:

Duplicate of #30 https://github.com/claudemamo/kafka-web-console/issues/30

— Reply to this email directly or view it on GitHub https://github.com/claudemamo/kafka-web-console/issues/47#issuecomment-74824782 .

foovungle commented 9 years ago

I tried the fork in development, & open files are kept under control. Will roll to production in the next few days to see if this helps..

foovungle commented 9 years ago

With https://github.com/ibanner56/kafka-web-console the system still hangs but it takes longer & not due to too many connections to kafka. I get a bunch of these when I do a sudo lsof:

java 16240 root 1535w FIFO 0,8 0t0 42163244 pipe java 16240 root 1536u 0000 0,9 0 7808 anon_inode java 16240 root 1537u 0000 0,9 0 7808 anon_inode java 16240 root 1538u 0000 0,9 0 7808 anon_inode java 16240 root 1539w FIFO 0,8 0t0 42193027 pipe java 16240 root 1541r FIFO 0,8 0t0 42186896 pipe java 16240 root 1542w FIFO 0,8 0t0 42186896 pipe java 16240 root 1543r FIFO 0,8 0t0 42174664 pipe java 16240 root 1544w FIFO 0,8 0t0 42174664 pipe java 16240 root 1545u 0000 0,9 0 7808 anon_inode java 16240 root 1546u 0000 0,9 0 7808 anon_inode java 16240 root 1547r FIFO 0,8 0t0 42199219 pipe java 16240 root 1548r FIFO 0,8 0t0 42176277 pipe java 16240 root 1549w FIFO 0,8 0t0 42176277 pipe

Eventually, the system runs out of open files.. Don't have time to debug this at the moment.