RIPE-NCC / rpki-validator

Other
64 stars 20 forks source link

validador crash ContextHandler Stopped 2.25 #43

Closed cwiech closed 3 years ago

cwiech commented 5 years ago

I am having issues where the software is crashing ( 2.25). I have seen issues in our lab where the nessus security scanner will kill software (a java spring application that I wrote... the nessus scanner will kill it once in a while... I haven't figure out what its doing just yet)... so I setup this server special and used iptables to ensure only the rpki clients could reach the server and isolate out the security scans.

NOTE on some of the clients... gobgp's implementation doesn't appear to send keepalives (nor does the validator)... so all those 172.17 hosts in the log will bounce from time to time. It looks to me like the validator software is doing something funny with ack seq numbers...I can submit that too... the validator seems to roll back in time on the seq numbers, and then it Fin's the sesion.

But ignore that for now, they come back again I am just using gobgp for testing... this is the real issue where the app dies... not sure what happened here...

2018-12-17 22:41:42,741 INFO [shutdownHook2] ContextHandler Stopped o.e.j.s.ServletContextHandler@6941827a{/,jar:file:/data/rpki/rpki-validator-app-2.25/lib/rpki-validator-app-2.25.jar!/public,UNAVAILABLE}

It's a good sized box, 128G or RAM, 16 cores/32 thread. It had 11 RPKI clients when this happend (they had run fine most of the day).

vhs13:/data/rpki/rpki-validator-app-2.25/log$ uname -a Linux vhs13 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

vhs13:/data/rpki/rpki-validator-app-2.25/log$ java -version openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-0ubuntu0.18.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

** validator.log when it died.

2018-12-17 22:41:42,729 INFO [New I/O server boss #1 ([id: 0x3b7d3a38, /0:0:0:0:0:0:0:0:8282])] RTRServerHandler Client connected : /172.17.0.8:52388 2018-12-17 22:41:42,730 INFO [shutdownHook1] RTRServerHandler Client disconnected : /172.17.0.3:55436 2018-12-17 22:41:42,730 INFO [New I/O server boss #1 ([id: 0x3b7d3a38, /0:0:0:0:0:0:0:0:8282])] RTRServerHandler Client connected : /172.17.0.5:51918 2018-12-17 22:41:42,731 INFO [shutdownHook1] RTRServerHandler Client disconnected : /172.17.0.2:50534 2018-12-17 22:41:42,732 INFO [New I/O server boss #1 ([id: 0x3b7d3a38, /0:0:0:0:0:0:0:0:8282])] RTRServerHandler Client connected : /10.40.137.173:52248 2018-12-17 22:41:42,733 INFO [shutdownHook1] RTRServerHandler Client disconnected : /172.17.0.4:39726 2018-12-17 22:41:42,733 INFO [New I/O server boss #1 ([id: 0x3b7d3a38, /0:0:0:0:0:0:0:0:8282])] RTRServerHandler Client connected : /172.17.0.7:47182 2018-12-17 22:41:42,734 INFO [New I/O server boss #1 ([id: 0x3b7d3a38, /0:0:0:0:0:0:0:0:8282])] RTRServerHandler Client connected : /172.17.0.9:52302 2018-12-17 22:41:42,734 INFO [New I/O server boss #1 ([id: 0x3b7d3a38, /0:0:0:0:0:0:0:0:8282])] RTRServerHandler Client connected : /172.17.0.3:55486 2018-12-17 22:41:42,736 INFO [shutdownHook2] ServerConnector Stopped ServerConnector@6aecbb8d{HTTP/1.1}{0.0.0.0:8080} 2018-12-17 22:41:42,741 INFO [shutdownHook2] ContextHandler Stopped o.e.j.s.ServletContextHandler@6941827a{/,jar:file:/data/rpki/rpki-validator-app-2.25/lib/rpki-validator-app-2.25.jar!/public,UNAVAILABLE} 2018-12-17 22:41:42,742 INFO [shutdownHook2] Main Terminating... 2018-12-17 22:41:56,486 WARN [New I/O server worker #1-4] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:56,487 INFO [New I/O server worker #1-4] RTRServerHandler Client disconnected : /172.17.0.7:47182 2018-12-17 22:41:56,488 WARN [New I/O server worker #1-4] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:56,948 WARN [New I/O server worker #1-6] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:56,948 WARN [New I/O server worker #1-3] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:56,949 WARN [New I/O server worker #1-62] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:56,950 INFO [New I/O server worker #1-3] RTRServerHandler Client disconnected : /10.40.137.173:52248 2018-12-17 22:41:56,950 INFO [New I/O server worker #1-6] RTRServerHandler Client disconnected : /172.17.0.3:55486 2018-12-17 22:41:56,951 INFO [New I/O server worker #1-62] RTRServerHandler Client disconnected : /172.17.0.10:54810 2018-12-17 22:41:56,951 WARN [New I/O server worker #1-6] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:56,951 WARN [New I/O server worker #1-3] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:56,951 WARN [New I/O server worker #1-62] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,362 WARN [New I/O server worker #1-2] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,363 INFO [New I/O server worker #1-2] RTRServerHandler Client disconnected : /172.17.0.5:51918 2018-12-17 22:41:57,363 WARN [New I/O server worker #1-2] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,367 WARN [New I/O server worker #1-5] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,369 INFO [New I/O server worker #1-5] RTRServerHandler Client disconnected : /172.17.0.9:52302 2018-12-17 22:41:57,369 WARN [New I/O server worker #1-5] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,370 WARN [New I/O server worker #1-64] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,370 INFO [New I/O server worker #1-64] RTRServerHandler Client disconnected : /10.40.137.173:44580 2018-12-17 22:41:57,371 WARN [New I/O server worker #1-64] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,373 WARN [New I/O server worker #1-63] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,373 INFO [New I/O server worker #1-63] RTRServerHandler Client disconnected : /172.17.0.6:49382 2018-12-17 22:41:57,373 WARN [New I/O server worker #1-63] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,384 WARN [New I/O server worker #1-1] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,384 INFO [New I/O server worker #1-1] RTRServerHandler Client disconnected : /172.17.0.8:52388 2018-12-17 22:41:57,384 WARN [New I/O server worker #1-1] RTRServerHandler java.nio.channels.ClosedChannelException 2018-12-17 22:41:57,386 INFO [shutdownHook1] RTRServer RTR server stopped

omuravskiy commented 5 years ago

Hi cwiech,

The second line in the log you supplied is coming from the "shutdownHook1" thread, which is usually called by the SIGTERM handler. Are you sure nothing is killing the process?

If it is dying on its own, the clue should be on the log before any of "shutdownHook" logs.

Cheers, Oleg

cwiech commented 5 years ago

Hmm… I found this in the terminal window this morning, this from the one that I had started the validator from (I have been working remote the last few days). Perhaps I missed this in the logs, I don’t remember seeing there. Looks like perhaps java ran out of memory on me. I think I remember seeing some config knobs to increase memory… let me bump these up a good bit.

jvm.memory.initial=512m # -Xms jvm option -> initial memory claimed by the jvm jvm.memory.maximum=1536m # -Xmx jvm option -> maximum memory for the jvm

root@vhs13:~# [ERROR] [12/17/2018 22:41:40.922] [default-akka.actor.default-dispatcher-31497] [ActorSystem(default)] Uncaught error from thread [default-akka.actor.default-dispatcher-31497] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled java.lang.OutOfMemoryError: GC overhead limit exceeded

[ERROR] [12/17/2018 22:41:42.721] [default-scheduler-1] [ActorSystem(default)] Uncaught error from thread [default-scheduler-1] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled java.lang.OutOfMemoryError: GC overhead limit exceeded at akka.util.internal.HashedWheelTimer$Worker.fetchExpiredTimeouts(HashedWheelTimer.java:333) at akka.util.internal.HashedWheelTimer$Worker.fetchExpiredTimeouts(HashedWheelTimer.java:326) at akka.util.internal.HashedWheelTimer$Worker.run(HashedWheelTimer.java:313) at java.lang.Thread.run(Thread.java:748)

From: Oleg Muravskiy notifications@github.com Reply-To: RIPE-NCC/rpki-validator reply@reply.github.com Date: Wednesday, December 19, 2018 at 5:56 AM To: RIPE-NCC/rpki-validator rpki-validator@noreply.github.com Cc: cwiech cwiechma@level3.net, Author author@noreply.github.com Subject: Re: [RIPE-NCC/rpki-validator] validador crash ContextHandler Stopped 2.25 (#43)

Hi cwiech,

The second line in the log you supplied is coming from the "shutdownHook1" thread, which is usually called by the SIGTERM handler. Are you sure nothing is killing the process?

If it is dying on its own, the clue should be on the log before any of "shutdownHook" logs.

Cheers, Oleg

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/RIPE-NCC/rpki-validator/issues/43#issuecomment-448586704, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ar19cj9QD0mEynnplwR1mGuBjqhwGCYJks5u6jcQgaJpZM4ZYfyb.

This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.

omuravskiy commented 5 years ago

Yes, 1.5G is a minimum maxMemory, if you have all RIR's trust anchors installed, that might be not enough. 2G should be OK, and 3-4G – even better.