Open siyuanh opened 10 years ago
What I understand is you don't have to connect to brokers for what you did in the UI for now. You can get everything from zookeeper, correct?
I have seen this as well - after running the application for several days, the output of netstat showed many hundreds of open connections to kafka brokers. Eventually the application ceased to function because it could no longer open any new sockets - the process max file limit was reached.
The console regularly connects to brokers to retrieve partition log sizes so that you can view partition size over time. I've checked the code and the client is closed after retrieving the log size. This will require further investigation.
@siyuanh & @unclebilly, as a temporary workaround, you can increase the Offset Fetch Interval in Settings to reduce the rate at which connections are created.
root@queue3:/opt/kafka-web-console# ps aux | grep kafka-web | grep -v grep | awk '{ print $2 }'
23907
root@queue3:/opt/kafka-web-console# lsof -n | grep kafka-console | grep IPv6 | wc -l
4601013
root@queue3:/opt/kafka-web-console# lsof -n | grep kafka-console | grep IPv6 | head
java 23907 kafka-console 142u IPv6 38815657 0t0 TCP *:9000 (LISTEN)
java 23907 kafka-console 147u IPv6 38927934 0t0 TCP 192.168.115.43:50926->192.168.115.41:9092 (ESTABLISHED)
java 23907 kafka-console 148u IPv6 38949202 0t0 TCP 192.168.115.43:50319->192.168.115.41:9092 (ESTABLISHED)
java 23907 kafka-console 149u IPv6 38829209 0t0 TCP 192.168.115.43:50320->192.168.115.41:9092 (ESTABLISHED)
....
Also have a stacktrace, if you are interested.
I'm using systemd to limit file descriptors to 65000, but interestingly the highest in the lsof output I've been able to found was 9999.
Oh, sorry for the edit after edit, but this is just lsof's royal stupidity, it uses an asterisk and 3 digits (*123) when it's above 9999.
java 23907 kafka-console 9996u IPv6 38962963 0t0 TCP 192.168.115.43:59860->192.168.115.41:9092 (ESTABLISHED)
java 23907 kafka-console 9997u IPv6 38962964 0t0 TCP 192.168.115.43:59861->192.168.115.41:9092 (ESTABLISHED)
java 23907 kafka-console 9998u IPv6 38962965 0t0 TCP 192.168.115.43:38020->192.168.115.42:9092 (ESTABLISHED)
java 23907 kafka-console 9999u IPv6 38962966 0t0 TCP 192.168.115.43:59863->192.168.115.41:9092 (ESTABLISHED)
java 23907 kafka-console *000u IPv6 38965025 0t0 TCP 192.168.115.43:59864->192.168.115.41:9092 (ESTABLISHED)
java 23907 kafka-console *001u IPv6 38965026 0t0 TCP 192.168.115.43:38023->192.168.115.42:9092 (ESTABLISHED)
java 23907 kafka-console *002u IPv6 38965742 0t0 TCP 192.168.115.43:37670->192.168.115.43:9092 (ESTABLISHED)
java 23907 kafka-console *003u IPv6 38965027 0t0 TCP 192.168.115.43:38025->192.168.115.42:9092 (ESTABLISHED)
java 23907 kafka-console *004u IPv6 38965028 0t0 TCP 192.168.115.43:59868->192.168.115.41:9092 (ESTABLISHED)
java 23907 kafka-console *005u IPv6 38966541 0t0 TCP 192.168.115.43:59869->192.168.115.41:9092 (ESTABLISHED)
I think I fixed the issue. I'm going to let it sit for a little while longer, but I'm not seeing any unbounded increase in the number of open files. If there aren't any additional problems, I'll push my changes to my fork tomorrow.
Update: When I run lsof -n | grep <PID> | wc -l
the number isn't changing by more than ~10, and it goes back down after a bit. When I run netstat -an | grep ESTA | grep 9092 | wc -l
the number continues to rise...
Update 2: Ah, wait, the netstat return value just dropped by 1000...
Alright, after >12 hours the netstat is still returning below 4000, the lsof below 200. I think this works.
Well, this only fixed part of the issue, apparently. I didn't notice because I was only checking for connections on Port 9092, but it's still leaving open sockets on 9091 and 9090. The number of connections on port 9092 doesn't increase without bound anymore, however.
The open file count is still less than 200, so that exception is gone, but instead of crashing it just continuously fails to open a socket.
Alright, I replaced the finagle-kafka library with a different kafka connection system and the number of open connections is consistently remaining below 20. I can only assume that okapies' library wasn't properly closing connections.
Thanks for the fix, works wonderfully for us. Let's hope it gets merged.
Has this been merged? Or should we keep using @ibanner56 's fork?
I'm running your fork and initially it looked way better (I could use the app for more than a minute which was the case before) but after a weekend being up and running it hangs again with lot of open files. Haven't had time to investigate it but let me know if you need some additional info.
$ lsof | grep 31434 |wc -l
327756
After running this web-console for a while, the connection table is full of connections to the brokers.