cjmamo / kafka-web-console

A web console for Apache Kafka (retired)
Apache License 2.0
762 stars 246 forks source link

It congests the web traffic on the node after running for a while #30

Open siyuanh opened 10 years ago

siyuanh commented 10 years ago

After running this web-console for a while, the connection table is full of connections to the brokers.

siyuanh commented 10 years ago

What I understand is you don't have to connect to brokers for what you did in the UI for now. You can get everything from zookeeper, correct?

unclebilly commented 10 years ago

I have seen this as well - after running the application for several days, the output of netstat showed many hundreds of open connections to kafka brokers. Eventually the application ceased to function because it could no longer open any new sockets - the process max file limit was reached.

cjmamo commented 10 years ago

The console regularly connects to brokers to retrieve partition log sizes so that you can view partition size over time. I've checked the code and the client is closed after retrieving the log size. This will require further investigation.

cjmamo commented 10 years ago

@siyuanh & @unclebilly, as a temporary workaround, you can increase the Offset Fetch Interval in Settings to reduce the rate at which connections are created.

PAStheLoD commented 9 years ago
root@queue3:/opt/kafka-web-console# ps aux | grep kafka-web | grep -v grep | awk '{ print $2 }'
23907
root@queue3:/opt/kafka-web-console# lsof -n | grep kafka-console | grep IPv6 | wc -l
4601013
root@queue3:/opt/kafka-web-console# lsof -n | grep kafka-console | grep IPv6 | head
java      23907       kafka-console  142u     IPv6           38815657       0t0        TCP *:9000 (LISTEN)
java      23907       kafka-console  147u     IPv6           38927934       0t0        TCP 192.168.115.43:50926->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console  148u     IPv6           38949202       0t0        TCP 192.168.115.43:50319->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console  149u     IPv6           38829209       0t0        TCP 192.168.115.43:50320->192.168.115.41:9092 (ESTABLISHED)
....

Also have a stacktrace, if you are interested.

I'm using systemd to limit file descriptors to 65000, but interestingly the highest in the lsof output I've been able to found was 9999.

Oh, sorry for the edit after edit, but this is just lsof's royal stupidity, it uses an asterisk and 3 digits (*123) when it's above 9999.

java      23907       kafka-console 9996u     IPv6           38962963       0t0        TCP 192.168.115.43:59860->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console 9997u     IPv6           38962964       0t0        TCP 192.168.115.43:59861->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console 9998u     IPv6           38962965       0t0        TCP 192.168.115.43:38020->192.168.115.42:9092 (ESTABLISHED)
java      23907       kafka-console 9999u     IPv6           38962966       0t0        TCP 192.168.115.43:59863->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console *000u     IPv6           38965025       0t0        TCP 192.168.115.43:59864->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console *001u     IPv6           38965026       0t0        TCP 192.168.115.43:38023->192.168.115.42:9092 (ESTABLISHED)
java      23907       kafka-console *002u     IPv6           38965742       0t0        TCP 192.168.115.43:37670->192.168.115.43:9092 (ESTABLISHED)
java      23907       kafka-console *003u     IPv6           38965027       0t0        TCP 192.168.115.43:38025->192.168.115.42:9092 (ESTABLISHED)
java      23907       kafka-console *004u     IPv6           38965028       0t0        TCP 192.168.115.43:59868->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console *005u     IPv6           38966541       0t0        TCP 192.168.115.43:59869->192.168.115.41:9092 (ESTABLISHED)
ibanner56 commented 9 years ago

I think I fixed the issue. I'm going to let it sit for a little while longer, but I'm not seeing any unbounded increase in the number of open files. If there aren't any additional problems, I'll push my changes to my fork tomorrow.

Update: When I run lsof -n | grep <PID> | wc -l the number isn't changing by more than ~10, and it goes back down after a bit. When I run netstat -an | grep ESTA | grep 9092 | wc -l the number continues to rise...

Update 2: Ah, wait, the netstat return value just dropped by 1000...

ibanner56 commented 9 years ago

Alright, after >12 hours the netstat is still returning below 4000, the lsof below 200. I think this works.

ibanner56 commented 9 years ago

Well, this only fixed part of the issue, apparently. I didn't notice because I was only checking for connections on Port 9092, but it's still leaving open sockets on 9091 and 9090. The number of connections on port 9092 doesn't increase without bound anymore, however.

The open file count is still less than 200, so that exception is gone, but instead of crashing it just continuously fails to open a socket.

ibanner56 commented 9 years ago

Alright, I replaced the finagle-kafka library with a different kafka connection system and the number of open connections is consistently remaining below 20. I can only assume that okapies' library wasn't properly closing connections.

https://github.com/ibanner56/kafka-web-console

PAStheLoD commented 9 years ago

Thanks for the fix, works wonderfully for us. Let's hope it gets merged.

foovungle commented 9 years ago

Has this been merged? Or should we keep using @ibanner56 's fork?

marcinszymaniuk commented 9 years ago

I'm running your fork and initially it looked way better (I could use the app for more than a minute which was the case before) but after a weekend being up and running it hangs again with lot of open files. Haven't had time to investigate it but let me know if you need some additional info. $ lsof | grep 31434 |wc -l 327756