eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
421 stars 179 forks source link

[feature] Improve eXist-db response to Java heap overflow #5439

Open djbpitt opened 2 weeks ago

djbpitt commented 2 weeks ago

Is your feature request related to a problem? Please describe.

My eXist-db (6.2.0 running on Centos 7 as a service) recently stalled with a heap overflow. I have the service configured to restart automatically, but this failure was not recognized as stopping, which means that eXist-db did not restart, and instead remained nonresponsive. My experience is limited to … er … my own experience, but I see no advantage to this behavior, and I would prefer a configuration where an eXist-db service recognizes a heap overflow and treats it like a service stop, that is, a configuration where the service restarts automatically on heap overflow. @adamretter has informed me (in the eXist-db Slack—thank you for the quick response!) that stopping on a heap overflow (which in my case would trigger a service restart) is available by setting the JAVA_OPTS environment variable to -XX:+ExitOnOutOfMemoryError, but that is not the default configuration. The point of this feature request is to ask that this setting be configured as the default.

Describe the solution you'd like I would like JAVA_OPS=-XX:+ExitOnOutOfMemoryError to be the default configuration.

Describe alternatives you've considered I can recover from a heap overflow by restarting manually, but that requires knowing that the heap overflow has occurred. Since the system becomes nonfunctional after a heap overflow, I see no advantage to making "remain nonfunctional until the administrator notices the problem and restarts manually" the default behavior, as currently appears to be the case.

Additional context No additional comments or suggestions.

dizzzz commented 1 week ago

@duncdrum would this make sense to add this to the docker containers too?

duncdrum commented 1 week ago

@dizzzz it already is both on the existdb/existdb and duncdrum/existdb images. Its a sensible thing to put as default .

adamretter commented 1 week ago

It's not necessarily sensible to have as a default if your container or systemd unit is set to auto restart; as that could further damage a corrupted db.

djbpitt commented 1 week ago

@adamretter Thank you for the warning! I thought (when I proposed this enhancement) that the appropriate first action when eXist-db halted after a heap overflow was to restart it, which is what I tried manually (and successfully) when I ran into this problem recently. Perhaps I've misread the notifications, but I thought that when eXist-db was restarted after an unclean interruption it ran diagnostics and tried to correct any corruptions that might have caused the initial failure. Was restarting eXist-db the wrong thing for me to do? If so, what would be a safer strategy?

duncdrum commented 1 week ago

@adamretter its been the default since May 20, 2019 for ephemeral container images, you merged the corresponding PR. I have yet to see it not perform as intended with auto restarting containers. Do you have a reproducible error condition where corruption occurs?

adamretter commented 1 week ago

you merged the corresponding PR

Yes, but that is for as you described "ephemeral containers", by default Docker does not restart your container when it exits.

Do you have a reproducible error condition where corruption occurs

No, or it would be a bug that I would fix. But as we all know, database corruption with eXist-db occurs, and when it does you don't want to compound the issue.