Closed GTB3NW closed 2 years ago
Sorry for the vague report, but it is quite simple. I managed to mitigate this by adding limits to my docker-compose.yaml
file for the zookeeper
and kafka
service. I believe java has patched this issue but the containers for zk & kafka are not up to date. I would propose some generous ulimits be set for these services as-standard rather than have users on newer operating systems have to manually patch.
Can you please explain what kernel limits you have reached?
Hi Amin, the limit I hit is nofile
. I've just copied the limits from other services which isn't ideal for production I need to tune these, but it allows the install to complete.
# cat docker-compose.override.yml
services:
zookeeper:
ulimits:
nofile:
soft: 10032
hard: 10032
kafka:
ulimits:
nofile:
soft: 10032
hard: 10032
To clarify, I believe the cause is newer operating systems are making use of the infinity limit for docker, but that gets inherited by the container if no limits are set. Older JVM versions are not aware of this and effectively pre-allocate causing an instant OOM.
Docker's systemd unit:
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
It makes sense really, just not ideal. Perhaps instead of updating the repo's docker-compose file, a new section to troubleshooting get added, or part of the install steps notify the user?
Is this widespread enough to be enshrined in the troubleshooting documentation? We do now have a great GitHub ticket describing the workaround. :)
What are the risks of adding this to docker-compose.yml
? Seems fine to me but curious for input.
Is this widespread enough to be enshrined in the troubleshooting documentation? We do now have a great GitHub ticket describing the workaround. :)
I think it will likely become more common as people update to newer OS's and kernels. While github issues are a nice place to find solutions, I think known gotchas are probably worth documenting. That's just my five cents as an end-user. But I do agree, at minimum now we have a github issue so nothing else is required, just nice to have.
What are the risks of adding this to docker-compose.yml? Seems fine to me but curious for input.
Yeah I mean those are just to limit the amount of file descriptors and in theory those services shouldn't require an unlimited amount of file descriptors, limits are there for a reason :P As for whether they're appropriate values I'm unsure. I've just grabbed them from another service to get this working.
Can you be more specific on what has changed in new kernels?
Can you be more specific on what has changed in new kernels?
Sorry, I've lost the link which explained the java OOM cause, but it mentioned kernel changes were the cause.
From the Discussions version of this thread:
@aminvakil:
I have not seen anywhere and couldn't find anything by searching too regarding this on newer kernels.
But let's keep this open(?) and see if anyone else has any other information about this.
@GTB3NW:
If I get time when I continue our migration, I'll spin up another instance quickly and grab the error, that should help me find the comment which said it, hopefully it provides some clues :)
Hi, I'd like to follow up on that promise at some point but having different priorities at work right now sorry. It's easily replicated using fedora server 35/36 and following the normal install steps
I'll try 22.4.0
on a Fedora 36 virtual machine today whenever I found time.
I can confirm running 22.4.0
on Fedora 36 without any changes fails with this error:
sentry-self-hosted-zookeeper-1 | library initialization failed - unable to allocate file descriptor table - out of memory===> ENV Variables ...
I guess the link @GTB3NW was mentioning about newer Linux is this: https://stackoverflow.com/a/56895801/3835210
In recent versions of Linux default limit for the number of open files has been increased significantly. Java 8 does the wrong thing of trying to allocate memory upfront for this number of file descriptors (see https://bugs.openjdk.java.net/browse/JDK-8150460). Previously this worked, when the default limit was much lower, but now it tries to allocate too much and fails. Workaround for this is to set a lower limit of number of open files (or use newer java):
$ mvn library initialization failed - unable to allocate file descriptor table - out of memoryAborted $ ulimit -n 10000 $ mvn [INFO] Scanning for projects... ...
So I checked and this has not been fixed until 5.5.9
(5.5.9
is broken), 6.0.0
does not have this error and seems to be fixed but fails with this:
[fedora@fedora self-hosted]$ docker-compose ps
NAME COMMAND SERVICE STATUS PORTS
sentry-self-hosted-clickhouse-1 "/entrypoint.sh" clickhouse exited (0)
sentry-self-hosted-kafka-1 "/etc/confluent/dock…" kafka created
sentry-self-hosted-redis-1 "docker-entrypoint.s…" redis exited (0)
sentry-self-hosted-zookeeper-1 "/etc/confluent/dock…" zookeeper exited (143)
[fedora@fedora self-hosted]$ docker-compose logs zookeeper
sentry-self-hosted-zookeeper-1 | ===> User
sentry-self-hosted-zookeeper-1 | uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
sentry-self-hosted-zookeeper-1 | ===> Configuring ...
sentry-self-hosted-zookeeper-1 | ===> Running preflight checks ...
sentry-self-hosted-zookeeper-1 | ===> Check if /var/lib/zookeeper/data is writable ...
sentry-self-hosted-zookeeper-1 | ===> Check if /var/lib/zookeeper/log is writable ...
sentry-self-hosted-zookeeper-1 | ===> Launching ...
sentry-self-hosted-zookeeper-1 | ===> Launching zookeeper ...
sentry-self-hosted-zookeeper-1 | [2022-06-18 11:02:37,273] WARN Either no config or no quorum defined in config, running in standalone mode (org.apache.zookeeper.server.quorum.QuorumPeerMain)
sentry-self-hosted-zookeeper-1 | [2022-06-18 11:02:37,576] WARN o.e.j.s.ServletContextHandler@1750fbeb{/,null,UNAVAILABLE} contextPath ends with /* (org.eclipse.jetty.server.handler.ContextHandler)
sentry-self-hosted-zookeeper-1 | [2022-06-18 11:02:37,577] WARN Empty contextPath (org.eclipse.jetty.server.handler.ContextHandler)
Also for future reference: https://access.redhat.com/solutions/6955891
I'd say let's change the issue title to something more appropriate to this problem, something like zookeeper: unable to allocate file descriptor table
at least for now.
And let's see if we can bump zookeeper and kafka versions in https://github.com/getsentry/self-hosted/issues/1292.
@GTB3NW Thank you very much for bringing this up!
I guess the link @GTB3NW was mentioning about newer Linux is this: https://stackoverflow.com/a/56895801/3835210
In recent versions of Linux default limit for the number of open files has been increased significantly. Java 8 does the wrong thing of trying to allocate memory upfront for this number of file descriptors (see https://bugs.openjdk.java.net/browse/JDK-8150460). Previously this worked, when the default limit was much lower, but now it tries to allocate too much and fails. Workaround for this is to set a lower limit of number of open files (or use newer java):
$ mvn library initialization failed - unable to allocate file descriptor table - out of memoryAborted $ ulimit -n 10000 $ mvn [INFO] Scanning for projects... ...
That's the exact comment I was referring to, thanks for finding it! Glad I've added fuel to the "update java runtime" fire for you :P
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog
or Status: In Progress
, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
It sounds like this will be fixed once we update kafka/zookeeper in https://github.com/getsentry/self-hosted/issues/1292 so I think we are safe to close this?
Version
22.4.0
Steps to Reproduce
Fresh install of fedora (35), follow normal steps for install
Expected Result
install.sh
to be successfulActual Result
Every component using the JVM will OOM due to system ulimits