CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

[Storage] JVM killed by operating system #2101

Open mreyescdl opened 2 weeks ago

mreyescdl commented 2 weeks ago
Nov 05 09:11:01 uc3-mrtstore-prd01 bash[1883]: /usr/bin/bash: line 1:  1886 Killed                  /usr/lib/jvm/java/bin/java $JAVA_OPTS $CATAL>

The above message from the Journal logs indicate that the production Storage JVM was killed by the OS. Looking through the storage logs shows no evidence that there was a JVM shutdown performed.

ashleygould commented 2 weeks ago

Found the following in /var/log/message. It was indeed the Operating system OOM-killer:

Nov  5 09:11:01 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: A process of this unit has been killed by the OOM killer.
Nov  5 09:11:02 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Main process exited, code=exited, status=137/n/a
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: 05-Nov-2024 09:11:03.632 SEVERE [main] org.apache.catalina.startup.Catalina.stopServer Could not contact [localhost:35120] (base port [35120] and offset [0]). Tomcat may not be running.
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: 05-Nov-2024 09:11:03.635 SEVERE [main] org.apache.catalina.startup.Catalina.stopServer Error stopping Catalina
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011java.net.ConnectException: Connection refused (Connection refused)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.connect(Socket.java:609)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.connect(Socket.java:558)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.<init>(Socket.java:454)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.net.Socket.<init>(Socket.java:231)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:667)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at java.base/java.lang.reflect.Method.invoke(Method.java:566)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:393)
Nov  5 09:11:03 uc3-mrtstore-prd01 env[611241]: #011#011at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:483)
Nov  5 09:11:03 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Control process exited, code=exited, status=1/FAILURE
Nov  5 09:11:03 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Failed with result 'oom-kill'.
Nov  5 09:11:03 uc3-mrtstore-prd01 systemd[1]: mrt-store.service: Consumed 8h 9min 53.891s CPU time.

Was there any other such event, or is this the only one?

ashleygould commented 2 weeks ago

sar -B for that day shows there was a big spike in memory pressure. Probably we should increase the size of the jvm:

04:10:01 AM     21.85    305.52   1181.69      0.32    774.72     13.08      0.00     26.08    199.39
04:20:01 AM      7.13    242.44    743.95      0.10    494.74      0.00      0.00      0.00      0.00
04:30:01 AM      0.00    259.90    705.72      0.02    448.77      0.00      0.00      0.00      0.00
04:40:01 AM      0.00    270.04    716.01      0.01    473.59      0.00      0.00      0.00      0.00
04:50:01 AM      0.05    274.50    750.02      0.01    495.75      0.00      0.00      0.00      0.00
05:00:02 AM      0.00    283.50    697.86      0.00    447.41      0.00      0.00      0.00      0.00
05:10:01 AM      8.26    406.66   1124.56      0.17    723.52     14.00      0.00     27.79    198.50
05:20:01 AM     14.83    290.17    751.17      0.15    497.97      0.00      0.00      0.00      0.00
05:30:01 AM      0.01    295.74    696.68      0.00    440.01      0.00      0.00      0.00      0.00
05:40:01 AM      8.04    360.86    727.16      0.24    496.09      0.00      0.00      0.00      0.00
05:50:01 AM      0.00    311.60    749.17      0.01    491.28      0.00      0.00      0.00      0.00
06:00:01 AM      0.00    311.91    696.09      0.01    437.29      0.00      0.00      0.00      0.00
06:10:01 AM     15.37    397.26   1146.52      0.22    718.46     17.68      0.00     34.51    195.17
06:20:01 AM     13.99    326.62    763.31      0.16    508.69      0.00      0.00      0.00      0.00
06:30:02 AM      0.00    342.72    760.98      0.00    494.80      0.00      0.00      0.00      0.00
06:40:01 AM      0.04    354.34    725.87      0.01    479.49      0.00      0.00      0.00      0.00
06:50:01 AM      0.08    342.57    744.08      0.02    486.57      0.50      0.00      0.98    195.99
07:00:01 AM      0.01    344.55    707.29      0.02    446.14      0.00      0.00      0.00      0.00
07:10:01 AM     33.48    820.37   1129.95      0.37    991.77     30.50      0.00     60.60    198.71
07:20:01 AM     14.76    364.01    753.44      0.15    495.71      0.00      0.00      0.00      0.00
07:30:01 AM     36.48    601.95   2306.30      1.17   1506.52      4.07      0.00      8.14    200.00
07:40:01 AM     10.92   1039.39   1063.90      0.22   1281.49     16.57      0.00     32.77    197.79
07:50:01 AM      9.78    744.13    768.28      0.17    895.89      8.48      0.00     16.63    196.07
08:00:01 AM      6.74    213.72    699.54      0.12    457.60      7.63      0.00     15.26    199.96
08:10:01 AM     60.91   1223.81   1140.57      0.57   1337.16     42.87      0.00     85.46    199.33
08:20:01 AM     24.44    882.44    749.32      0.16    981.91      0.56      0.00      1.06    188.13

08:20:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
08:30:01 AM      4.94    276.38    728.25      0.08    478.57      5.08      0.00     10.12    199.21
08:40:01 AM      1.61   1013.16    755.20      0.05   1020.93      1.89      0.00      3.77    200.00
08:50:01 AM      9.49    347.68    790.04      0.16    604.35     19.77      0.00     39.04    197.47
09:00:01 AM     15.15    322.81    915.90      0.24   6491.57   1825.32      0.69   3617.54    198.11
09:10:01 AM    682.99   1550.75   1533.81      3.89   9180.54   3964.86      0.00   5812.57    146.60
09:20:01 AM   2769.41    389.65   4474.18     24.56  20364.04  63981.22  15891.65  12353.85     15.47
09:30:01 AM      1.42    175.14    722.08      0.11    460.53      0.00      0.00      0.00      0.00
09:40:02 AM    627.41    568.69   5699.68      2.88   4265.69      0.00      0.00      0.00      0.00
09:50:01 AM      6.14    234.41    765.63      0.09    527.54      0.00      0.00      0.00      0.00
10:00:01 AM      3.10   1299.27   2741.44      0.10   1270.06      0.00      0.00      0.00      0.00
10:10:01 AM    243.44    812.89   1179.07      1.32   1066.27      0.00      0.00      0.00      0.00
10:20:02 AM      0.21    929.83    790.10      0.00    963.57      0.00      0.00      0.00      0.00
10:30:01 AM      0.13    268.28    715.15      0.00    456.57      0.00      0.00      0.00      0.00
10:40:01 AM     50.10    503.73   1006.87      0.04    920.68      0.00      0.00      0.00      0.00
ashleygould commented 2 weeks ago

The next day shortly after 11am, both service in uc3-mrt-store-prd start to experience high memory pressure:

uc3-mrtstore-prd01

08:20:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
08:30:01 AM      0.10    371.71    707.41      0.01    461.54      0.00      0.00      0.00      0.00
08:40:01 AM      0.00    156.95    724.25      0.00    472.87      0.00      0.00      0.00      0.00
08:50:01 AM      0.03    222.45    716.78      0.01    534.21      0.00      0.00      0.00      0.00
09:00:01 AM      0.00    212.07    752.18      0.01    498.76      0.00      0.00      0.00      0.00
09:10:01 AM      0.49    251.45   1130.91      0.00    661.97      0.00      0.00      0.00      0.00
09:20:01 AM      0.00    273.53    699.13      0.00    663.03      0.00      0.00      0.00      0.00
09:30:01 AM      0.00    246.34    734.71      0.02    475.24      0.00      0.00      0.00      0.00
09:40:02 AM      0.00    247.45    706.90      0.00    455.76      0.00      0.00      0.00      0.00
09:50:01 AM      0.00    319.04    695.09      0.01    501.90      0.00      0.00      0.00      0.00
10:00:01 AM     12.10    335.58    926.27      0.04    598.01      0.00      0.00      0.00      0.00
10:10:01 AM      0.76    346.29   1175.91      0.01    704.64      0.00      0.00      0.00      0.00
10:20:01 AM      0.00    395.93    686.00      0.00    600.94      0.00      0.00      0.00      0.00
10:30:01 AM      0.00    333.91    723.10      0.00    463.54      0.00      0.00      0.00      0.00
10:40:01 AM      0.00    347.24    798.51      0.00    537.17      0.00      0.00      0.00      0.00
10:50:01 AM      0.00   1357.99    749.12      0.00   1604.23      0.00      0.00      0.00      0.00
11:00:01 AM      0.00    373.80    818.28      0.02   3412.85      0.00      0.00      0.00      0.00
11:10:01 AM      0.26    485.01   1333.49      0.00   4689.26      0.00      0.00      0.00      0.00
11:20:01 AM     14.42   1573.81    749.40      0.22  10596.74   1836.49      1.46   2762.82    150.32
11:30:01 AM     91.50    419.01    712.55      0.86   7093.05   1919.30      3.12   2730.13    142.01
11:40:01 AM    137.32    439.89    751.35      0.76   1071.50    306.09      0.11    290.35     94.82
11:50:01 AM     59.36   1757.59    731.41      0.56   1996.14    339.74      1.20    330.95     97.07
12:00:01 PM     22.97    441.08    699.00      0.27   1765.83    334.70      2.91    561.87    166.43
12:10:01 PM    459.05    552.16   1215.42      1.98   2179.92    928.18      0.86    862.47     92.83
12:20:01 PM     36.79    875.53    695.60      0.31   1976.64    413.71      0.66    554.32    133.77
12:30:01 PM    351.27   1081.78    906.42      3.50  33563.01   8925.26     68.25  13905.91    154.62
12:40:01 PM    448.50    385.25    904.50      5.08  29705.49   6903.48     15.81  12134.91    175.38
12:50:01 PM    246.85   1896.48    836.75      3.43   7671.93   1549.36      0.00   2773.90    179.03
01:00:01 PM    370.87    171.35    756.08      1.85   7113.35   1596.33      2.43   2890.61    180.80
01:10:01 PM    530.83    224.84   1185.02      3.16   3737.42    825.06      0.13   1421.52    172.27
01:20:01 PM     47.32   1202.48    745.58      0.89   4581.84    796.25      0.00   1488.93    186.99

uc3-mrtstore-prd02

08:20:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
08:30:01 AM      0.00    332.61    684.07      0.00    443.88      0.00      0.00      0.00      0.00
08:40:01 AM      0.00    355.02    767.86      0.01    504.92      0.00      0.00      0.00      0.00
08:50:01 AM      0.00    366.68    724.49      0.01    544.27      0.00      0.00      0.00      0.00
09:00:01 AM      0.00    334.74    694.82      0.01    438.61      0.00      0.00      0.00      0.00
09:10:01 AM      1.56    420.67   1219.78      0.04    750.72      0.00      0.00      0.00      0.00
09:20:01 AM      0.00    424.57    703.84      0.01    500.79      0.00      0.00      0.00      0.00
09:30:01 AM      0.00    375.59    692.90      0.00    437.33      0.00      0.00      0.00      0.00
09:40:01 AM      0.00    406.01    754.53      0.00    491.13      0.00      0.00      0.00      0.00
09:50:01 AM      0.00    477.54    699.61      0.01    505.24      0.00      0.00      0.00      0.00
10:00:01 AM      0.00    391.96    694.90      0.02    440.23      0.00      0.00      0.00      0.00
10:10:01 AM      0.14    482.07   1195.65      0.00    702.09      0.00      0.00      0.00      0.00
10:20:02 AM      0.01    480.95    706.99      0.00    585.06      0.00      0.00      0.00      0.00
10:30:01 AM      0.00    400.31    695.96      0.01    439.19      0.00      0.00      0.00      0.00
10:40:01 AM      0.07    311.29    780.64      0.01    516.92      0.00      0.00      0.00      0.00
10:50:01 AM      0.00   1125.25    730.99      0.00   1162.56      0.00      0.00      0.00      0.00
11:00:01 AM      0.01    177.76    887.44      0.01   2925.52      0.00      0.00      0.00      0.00
11:10:01 AM    205.27    270.54   1457.40      1.49  15234.38   2260.97      3.41   2865.05    126.53
11:20:01 AM     42.14   1454.37    734.88      0.44  10119.97   3024.77     16.22   3762.67    123.73
11:30:01 AM     82.23    229.13    708.42      0.76   6471.38   1849.67     37.54   2462.38    130.48
11:40:01 AM     86.81    281.46    808.00      0.69   1810.04    536.60      1.88    547.02    101.58
11:50:01 AM    233.38   1715.28    772.95      1.19   3235.58    506.84      8.77    864.55    167.67
12:00:01 PM     41.13    294.40    701.36      0.46   1970.41    416.20      0.00    619.85    148.93
12:10:02 PM    406.81    512.95   1189.02      1.72   1123.36    266.49     21.27    325.59    113.15
12:20:01 PM     16.10    795.33    732.20      0.89   6539.06   1321.21     60.51   2406.66    174.18
12:30:01 PM    166.89    327.49    743.93      1.54   8606.56   2165.52      5.49   3382.56    155.81
12:40:01 PM    179.51    469.41    835.30      2.32  15552.43   3670.93      5.43   6175.27    167.97
12:50:01 PM     71.45   1830.86    778.00      1.93   3323.61    522.76      0.00    876.53    167.67
01:00:01 PM    103.94    378.56    859.85      2.01   9621.27   2380.80      0.10   4043.82    169.84
01:10:01 PM    681.31    337.24   1261.62      3.74   3507.01    842.41      0.99   1328.60    157.53