Open mreyescdl opened 2 weeks ago
Found the following in /var/log/message. It was indeed the Operating system OOM-killer:
Nov 5 09:11:01 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: A process of this unit has been killed by the OOM killer.
Nov 5 09:11:02 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Main process exited, code=exited, status=137/n/a
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: 05-Nov-2024 09:11:03.632 SEVERE [main] org.apache.catalina.startup.Catalina.stopServer Could not contact [localhost:35120] (base port [35120] and offset [0]). Tomcat may not be running.
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: 05-Nov-2024 09:11:03.635 SEVERE [main] org.apache.catalina.startup.Catalina.stopServer Error stopping Catalina
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011java.net.ConnectException: Connection refused (Connection refused)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.connect(Socket.java:609)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.connect(Socket.java:558)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.<init>(Socket.java:454)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.<init>(Socket.java:231)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:667)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.lang.reflect.Method.invoke(Method.java:566)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:393)
Nov 5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:483)
Nov 5 09:11:03 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Control process exited, code=exited, status=1/FAILURE
Nov 5 09:11:03 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Failed with result 'oom-kill'.
Nov 5 09:11:03 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Consumed 8h 9min 53.891s CPU time.
Was there any other such event, or is this the only one?
sar -B
for that day shows there was a big spike in memory pressure. Probably we should increase the size of the jvm:
04:10:01 AM 21.85 305.52 1181.69 0.32 774.72 13.08 0.00 26.08 199.39
04:20:01 AM 7.13 242.44 743.95 0.10 494.74 0.00 0.00 0.00 0.00
04:30:01 AM 0.00 259.90 705.72 0.02 448.77 0.00 0.00 0.00 0.00
04:40:01 AM 0.00 270.04 716.01 0.01 473.59 0.00 0.00 0.00 0.00
04:50:01 AM 0.05 274.50 750.02 0.01 495.75 0.00 0.00 0.00 0.00
05:00:02 AM 0.00 283.50 697.86 0.00 447.41 0.00 0.00 0.00 0.00
05:10:01 AM 8.26 406.66 1124.56 0.17 723.52 14.00 0.00 27.79 198.50
05:20:01 AM 14.83 290.17 751.17 0.15 497.97 0.00 0.00 0.00 0.00
05:30:01 AM 0.01 295.74 696.68 0.00 440.01 0.00 0.00 0.00 0.00
05:40:01 AM 8.04 360.86 727.16 0.24 496.09 0.00 0.00 0.00 0.00
05:50:01 AM 0.00 311.60 749.17 0.01 491.28 0.00 0.00 0.00 0.00
06:00:01 AM 0.00 311.91 696.09 0.01 437.29 0.00 0.00 0.00 0.00
06:10:01 AM 15.37 397.26 1146.52 0.22 718.46 17.68 0.00 34.51 195.17
06:20:01 AM 13.99 326.62 763.31 0.16 508.69 0.00 0.00 0.00 0.00
06:30:02 AM 0.00 342.72 760.98 0.00 494.80 0.00 0.00 0.00 0.00
06:40:01 AM 0.04 354.34 725.87 0.01 479.49 0.00 0.00 0.00 0.00
06:50:01 AM 0.08 342.57 744.08 0.02 486.57 0.50 0.00 0.98 195.99
07:00:01 AM 0.01 344.55 707.29 0.02 446.14 0.00 0.00 0.00 0.00
07:10:01 AM 33.48 820.37 1129.95 0.37 991.77 30.50 0.00 60.60 198.71
07:20:01 AM 14.76 364.01 753.44 0.15 495.71 0.00 0.00 0.00 0.00
07:30:01 AM 36.48 601.95 2306.30 1.17 1506.52 4.07 0.00 8.14 200.00
07:40:01 AM 10.92 1039.39 1063.90 0.22 1281.49 16.57 0.00 32.77 197.79
07:50:01 AM 9.78 744.13 768.28 0.17 895.89 8.48 0.00 16.63 196.07
08:00:01 AM 6.74 213.72 699.54 0.12 457.60 7.63 0.00 15.26 199.96
08:10:01 AM 60.91 1223.81 1140.57 0.57 1337.16 42.87 0.00 85.46 199.33
08:20:01 AM 24.44 882.44 749.32 0.16 981.91 0.56 0.00 1.06 188.13
08:20:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
08:30:01 AM 4.94 276.38 728.25 0.08 478.57 5.08 0.00 10.12 199.21
08:40:01 AM 1.61 1013.16 755.20 0.05 1020.93 1.89 0.00 3.77 200.00
08:50:01 AM 9.49 347.68 790.04 0.16 604.35 19.77 0.00 39.04 197.47
09:00:01 AM 15.15 322.81 915.90 0.24 6491.57 1825.32 0.69 3617.54 198.11
09:10:01 AM 682.99 1550.75 1533.81 3.89 9180.54 3964.86 0.00 5812.57 146.60
09:20:01 AM 2769.41 389.65 4474.18 24.56 20364.04 63981.22 15891.65 12353.85 15.47
09:30:01 AM 1.42 175.14 722.08 0.11 460.53 0.00 0.00 0.00 0.00
09:40:02 AM 627.41 568.69 5699.68 2.88 4265.69 0.00 0.00 0.00 0.00
09:50:01 AM 6.14 234.41 765.63 0.09 527.54 0.00 0.00 0.00 0.00
10:00:01 AM 3.10 1299.27 2741.44 0.10 1270.06 0.00 0.00 0.00 0.00
10:10:01 AM 243.44 812.89 1179.07 1.32 1066.27 0.00 0.00 0.00 0.00
10:20:02 AM 0.21 929.83 790.10 0.00 963.57 0.00 0.00 0.00 0.00
10:30:01 AM 0.13 268.28 715.15 0.00 456.57 0.00 0.00 0.00 0.00
10:40:01 AM 50.10 503.73 1006.87 0.04 920.68 0.00 0.00 0.00 0.00
The next day shortly after 11am, both service in uc3-mrt-store-prd start to experience high memory pressure:
uc3-mrtstore-prd01
08:20:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
08:30:01 AM 0.10 371.71 707.41 0.01 461.54 0.00 0.00 0.00 0.00
08:40:01 AM 0.00 156.95 724.25 0.00 472.87 0.00 0.00 0.00 0.00
08:50:01 AM 0.03 222.45 716.78 0.01 534.21 0.00 0.00 0.00 0.00
09:00:01 AM 0.00 212.07 752.18 0.01 498.76 0.00 0.00 0.00 0.00
09:10:01 AM 0.49 251.45 1130.91 0.00 661.97 0.00 0.00 0.00 0.00
09:20:01 AM 0.00 273.53 699.13 0.00 663.03 0.00 0.00 0.00 0.00
09:30:01 AM 0.00 246.34 734.71 0.02 475.24 0.00 0.00 0.00 0.00
09:40:02 AM 0.00 247.45 706.90 0.00 455.76 0.00 0.00 0.00 0.00
09:50:01 AM 0.00 319.04 695.09 0.01 501.90 0.00 0.00 0.00 0.00
10:00:01 AM 12.10 335.58 926.27 0.04 598.01 0.00 0.00 0.00 0.00
10:10:01 AM 0.76 346.29 1175.91 0.01 704.64 0.00 0.00 0.00 0.00
10:20:01 AM 0.00 395.93 686.00 0.00 600.94 0.00 0.00 0.00 0.00
10:30:01 AM 0.00 333.91 723.10 0.00 463.54 0.00 0.00 0.00 0.00
10:40:01 AM 0.00 347.24 798.51 0.00 537.17 0.00 0.00 0.00 0.00
10:50:01 AM 0.00 1357.99 749.12 0.00 1604.23 0.00 0.00 0.00 0.00
11:00:01 AM 0.00 373.80 818.28 0.02 3412.85 0.00 0.00 0.00 0.00
11:10:01 AM 0.26 485.01 1333.49 0.00 4689.26 0.00 0.00 0.00 0.00
11:20:01 AM 14.42 1573.81 749.40 0.22 10596.74 1836.49 1.46 2762.82 150.32
11:30:01 AM 91.50 419.01 712.55 0.86 7093.05 1919.30 3.12 2730.13 142.01
11:40:01 AM 137.32 439.89 751.35 0.76 1071.50 306.09 0.11 290.35 94.82
11:50:01 AM 59.36 1757.59 731.41 0.56 1996.14 339.74 1.20 330.95 97.07
12:00:01 PM 22.97 441.08 699.00 0.27 1765.83 334.70 2.91 561.87 166.43
12:10:01 PM 459.05 552.16 1215.42 1.98 2179.92 928.18 0.86 862.47 92.83
12:20:01 PM 36.79 875.53 695.60 0.31 1976.64 413.71 0.66 554.32 133.77
12:30:01 PM 351.27 1081.78 906.42 3.50 33563.01 8925.26 68.25 13905.91 154.62
12:40:01 PM 448.50 385.25 904.50 5.08 29705.49 6903.48 15.81 12134.91 175.38
12:50:01 PM 246.85 1896.48 836.75 3.43 7671.93 1549.36 0.00 2773.90 179.03
01:00:01 PM 370.87 171.35 756.08 1.85 7113.35 1596.33 2.43 2890.61 180.80
01:10:01 PM 530.83 224.84 1185.02 3.16 3737.42 825.06 0.13 1421.52 172.27
01:20:01 PM 47.32 1202.48 745.58 0.89 4581.84 796.25 0.00 1488.93 186.99
uc3-mrtstore-prd02
08:20:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
08:30:01 AM 0.00 332.61 684.07 0.00 443.88 0.00 0.00 0.00 0.00
08:40:01 AM 0.00 355.02 767.86 0.01 504.92 0.00 0.00 0.00 0.00
08:50:01 AM 0.00 366.68 724.49 0.01 544.27 0.00 0.00 0.00 0.00
09:00:01 AM 0.00 334.74 694.82 0.01 438.61 0.00 0.00 0.00 0.00
09:10:01 AM 1.56 420.67 1219.78 0.04 750.72 0.00 0.00 0.00 0.00
09:20:01 AM 0.00 424.57 703.84 0.01 500.79 0.00 0.00 0.00 0.00
09:30:01 AM 0.00 375.59 692.90 0.00 437.33 0.00 0.00 0.00 0.00
09:40:01 AM 0.00 406.01 754.53 0.00 491.13 0.00 0.00 0.00 0.00
09:50:01 AM 0.00 477.54 699.61 0.01 505.24 0.00 0.00 0.00 0.00
10:00:01 AM 0.00 391.96 694.90 0.02 440.23 0.00 0.00 0.00 0.00
10:10:01 AM 0.14 482.07 1195.65 0.00 702.09 0.00 0.00 0.00 0.00
10:20:02 AM 0.01 480.95 706.99 0.00 585.06 0.00 0.00 0.00 0.00
10:30:01 AM 0.00 400.31 695.96 0.01 439.19 0.00 0.00 0.00 0.00
10:40:01 AM 0.07 311.29 780.64 0.01 516.92 0.00 0.00 0.00 0.00
10:50:01 AM 0.00 1125.25 730.99 0.00 1162.56 0.00 0.00 0.00 0.00
11:00:01 AM 0.01 177.76 887.44 0.01 2925.52 0.00 0.00 0.00 0.00
11:10:01 AM 205.27 270.54 1457.40 1.49 15234.38 2260.97 3.41 2865.05 126.53
11:20:01 AM 42.14 1454.37 734.88 0.44 10119.97 3024.77 16.22 3762.67 123.73
11:30:01 AM 82.23 229.13 708.42 0.76 6471.38 1849.67 37.54 2462.38 130.48
11:40:01 AM 86.81 281.46 808.00 0.69 1810.04 536.60 1.88 547.02 101.58
11:50:01 AM 233.38 1715.28 772.95 1.19 3235.58 506.84 8.77 864.55 167.67
12:00:01 PM 41.13 294.40 701.36 0.46 1970.41 416.20 0.00 619.85 148.93
12:10:02 PM 406.81 512.95 1189.02 1.72 1123.36 266.49 21.27 325.59 113.15
12:20:01 PM 16.10 795.33 732.20 0.89 6539.06 1321.21 60.51 2406.66 174.18
12:30:01 PM 166.89 327.49 743.93 1.54 8606.56 2165.52 5.49 3382.56 155.81
12:40:01 PM 179.51 469.41 835.30 2.32 15552.43 3670.93 5.43 6175.27 167.97
12:50:01 PM 71.45 1830.86 778.00 1.93 3323.61 522.76 0.00 876.53 167.67
01:00:01 PM 103.94 378.56 859.85 2.01 9621.27 2380.80 0.10 4043.82 169.84
01:10:01 PM 681.31 337.24 1261.62 3.74 3507.01 842.41 0.99 1328.60 157.53
The above message from the Journal logs indicate that the production Storage JVM was killed by the OS. Looking through the storage logs shows no evidence that there was a JVM shutdown performed.