getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.89k stars 1.77k forks source link

No ulimits for various components #1445

Closed GTB3NW closed 2 years ago

GTB3NW commented 2 years ago

Version

22.4.0

Steps to Reproduce

Fresh install of fedora (35), follow normal steps for install

Expected Result

install.sh to be successful

Actual Result

Every component using the JVM will OOM due to system ulimits

GTB3NW commented 2 years ago

Sorry for the vague report, but it is quite simple. I managed to mitigate this by adding limits to my docker-compose.yaml file for the zookeeper and kafka service. I believe java has patched this issue but the containers for zk & kafka are not up to date. I would propose some generous ulimits be set for these services as-standard rather than have users on newer operating systems have to manually patch.

aminvakil commented 2 years ago

Can you please explain what kernel limits you have reached?

GTB3NW commented 2 years ago

Hi Amin, the limit I hit is nofile. I've just copied the limits from other services which isn't ideal for production I need to tune these, but it allows the install to complete.

# cat docker-compose.override.yml

services:
  zookeeper:
    ulimits:
      nofile:
        soft: 10032
        hard: 10032

  kafka:
    ulimits:
      nofile:
        soft: 10032
        hard: 10032
GTB3NW commented 2 years ago

To clarify, I believe the cause is newer operating systems are making use of the infinity limit for docker, but that gets inherited by the container if no limits are set. Older JVM versions are not aware of this and effectively pre-allocate causing an instant OOM.

Docker's systemd unit:

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

It makes sense really, just not ideal. Perhaps instead of updating the repo's docker-compose file, a new section to troubleshooting get added, or part of the install steps notify the user?

chadwhitacre commented 2 years ago

Is this widespread enough to be enshrined in the troubleshooting documentation? We do now have a great GitHub ticket describing the workaround. :)

What are the risks of adding this to docker-compose.yml? Seems fine to me but curious for input.

GTB3NW commented 2 years ago

Is this widespread enough to be enshrined in the troubleshooting documentation? We do now have a great GitHub ticket describing the workaround. :)

I think it will likely become more common as people update to newer OS's and kernels. While github issues are a nice place to find solutions, I think known gotchas are probably worth documenting. That's just my five cents as an end-user. But I do agree, at minimum now we have a github issue so nothing else is required, just nice to have.

What are the risks of adding this to docker-compose.yml? Seems fine to me but curious for input.

Yeah I mean those are just to limit the amount of file descriptors and in theory those services shouldn't require an unlimited amount of file descriptors, limits are there for a reason :P As for whether they're appropriate values I'm unsure. I've just grabbed them from another service to get this working.

aminvakil commented 2 years ago

Can you be more specific on what has changed in new kernels?

GTB3NW commented 2 years ago

Can you be more specific on what has changed in new kernels?

Sorry, I've lost the link which explained the java OOM cause, but it mentioned kernel changes were the cause.

chadwhitacre commented 2 years ago

From the Discussions version of this thread:

@aminvakil:

I have not seen anywhere and couldn't find anything by searching too regarding this on newer kernels.

But let's keep this open(?) and see if anyone else has any other information about this.

@GTB3NW:

If I get time when I continue our migration, I'll spin up another instance quickly and grab the error, that should help me find the comment which said it, hopefully it provides some clues :)

GTB3NW commented 2 years ago

Hi, I'd like to follow up on that promise at some point but having different priorities at work right now sorry. It's easily replicated using fedora server 35/36 and following the normal install steps

aminvakil commented 2 years ago

I'll try 22.4.0 on a Fedora 36 virtual machine today whenever I found time.

aminvakil commented 2 years ago

I can confirm running 22.4.0 on Fedora 36 without any changes fails with this error:

sentry-self-hosted-zookeeper-1   | library initialization failed - unable to allocate file descriptor table - out of memory===> ENV Variables ...
aminvakil commented 2 years ago

I guess the link @GTB3NW was mentioning about newer Linux is this: https://stackoverflow.com/a/56895801/3835210

In recent versions of Linux default limit for the number of open files has been increased significantly. Java 8 does the wrong thing of trying to allocate memory upfront for this number of file descriptors (see https://bugs.openjdk.java.net/browse/JDK-8150460). Previously this worked, when the default limit was much lower, but now it tries to allocate too much and fails. Workaround for this is to set a lower limit of number of open files (or use newer java):

$ mvn
library initialization failed - unable to allocate file descriptor table - out of memoryAborted
$ ulimit -n 10000
$ mvn
[INFO] Scanning for projects...
...
aminvakil commented 2 years ago

So I checked and this has not been fixed until 5.5.9 (5.5.9 is broken), 6.0.0 does not have this error and seems to be fixed but fails with this:

[fedora@fedora self-hosted]$ docker-compose ps
NAME                              COMMAND                  SERVICE             STATUS              PORTS
sentry-self-hosted-clickhouse-1   "/entrypoint.sh"         clickhouse          exited (0)          
sentry-self-hosted-kafka-1        "/etc/confluent/dock…"   kafka               created             
sentry-self-hosted-redis-1        "docker-entrypoint.s…"   redis               exited (0)          
sentry-self-hosted-zookeeper-1    "/etc/confluent/dock…"   zookeeper           exited (143)        
[fedora@fedora self-hosted]$ docker-compose logs zookeeper
sentry-self-hosted-zookeeper-1  | ===> User
sentry-self-hosted-zookeeper-1  | uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
sentry-self-hosted-zookeeper-1  | ===> Configuring ...
sentry-self-hosted-zookeeper-1  | ===> Running preflight checks ... 
sentry-self-hosted-zookeeper-1  | ===> Check if /var/lib/zookeeper/data is writable ...
sentry-self-hosted-zookeeper-1  | ===> Check if /var/lib/zookeeper/log is writable ...
sentry-self-hosted-zookeeper-1  | ===> Launching ... 
sentry-self-hosted-zookeeper-1  | ===> Launching zookeeper ... 
sentry-self-hosted-zookeeper-1  | [2022-06-18 11:02:37,273] WARN Either no config or no quorum defined in config, running  in standalone mode (org.apache.zookeeper.server.quorum.QuorumPeerMain)
sentry-self-hosted-zookeeper-1  | [2022-06-18 11:02:37,576] WARN o.e.j.s.ServletContextHandler@1750fbeb{/,null,UNAVAILABLE} contextPath ends with /* (org.eclipse.jetty.server.handler.ContextHandler)
sentry-self-hosted-zookeeper-1  | [2022-06-18 11:02:37,577] WARN Empty contextPath (org.eclipse.jetty.server.handler.ContextHandler)

Also for future reference: https://access.redhat.com/solutions/6955891

I'd say let's change the issue title to something more appropriate to this problem, something like zookeeper: unable to allocate file descriptor table at least for now.

And let's see if we can bump zookeeper and kafka versions in https://github.com/getsentry/self-hosted/issues/1292.

@GTB3NW Thank you very much for bringing this up!

GTB3NW commented 2 years ago

I guess the link @GTB3NW was mentioning about newer Linux is this: https://stackoverflow.com/a/56895801/3835210

In recent versions of Linux default limit for the number of open files has been increased significantly. Java 8 does the wrong thing of trying to allocate memory upfront for this number of file descriptors (see https://bugs.openjdk.java.net/browse/JDK-8150460). Previously this worked, when the default limit was much lower, but now it tries to allocate too much and fails. Workaround for this is to set a lower limit of number of open files (or use newer java):

$ mvn
library initialization failed - unable to allocate file descriptor table - out of memoryAborted
$ ulimit -n 10000
$ mvn
[INFO] Scanning for projects...
...

That's the exact comment I was referring to, thanks for finding it! Glad I've added fuel to the "update java runtime" fire for you :P

github-actions[bot] commented 2 years ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

ethanhs commented 2 years ago

It sounds like this will be fixed once we update kafka/zookeeper in https://github.com/getsentry/self-hosted/issues/1292 so I think we are safe to close this?