corretto / corretto-17

Amazon Corretto 17 is a no-cost, multi-platform, production-ready distribution of OpenJDK 17
GNU General Public License v2.0
214 stars 50 forks source link

occasional SIGSEGV #156

Open zp-stripe opened 10 months ago

zp-stripe commented 10 months ago

Thank you for taking the time to help improve OpenJDK and Corretto.

If your request concerns a security vulnerability then please report it by email to aws-security@amazon.com instead of here. (You can find more information regarding security issues at https://aws.amazon.com/security/vulnerability-reporting/.)

Otherwise, if your issue concerns OpenJDK and is not specific to Corretto we ask that you raise it to the OpenJDK community. Depending on your contributor status for OpenJDK, please use the JDK bug system or the appropriate mailing list for the given problem area or update project.

If your issue is specific to Corretto, then you are in the right place. Please proceed with the following.

Describe the bug

A clear and concise description of what the bug is.

#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=2405, tid=4116

---------------  T H R E A D  ---------------

Current thread (0x0000ffcb10179800):  JavaThread "20231017_192415_17827_dcx6n.2.41.0-24-81" [_thread_in_Java, id=4116, stack(0x0000ffc9ee400000,0x0000ffc9ee600000)]

Stack: [0x0000ffc9ee400000,0x0000ffc9ee600000],  sp=0x0000ffc9ee5fd890,  free space=2038k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  0xfffffffffffffffc
j  java.lang.invoke.LambdaForm$MH+0x000000c002aee000.invoke(Ljava/lang/Object;Ljava/lang/Object;I)Ljava/lang/Object;+42 java.base@17.0.8.1
j  java.lang.invoke.Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;ILjava/lang/Object;)Ljava/lang/Object;+6 java.base@17.0.8.1
j  io.trino.$gen.PageFilter_20231017_192428_32448.filter(Lio/trino/spi/connector/ConnectorSession;Lio/trino/spi/Page;I)Z+119
j  io.trino.$gen.PageFilter_20231017_192428_32448.filter(Lio/trino/spi/connector/ConnectorSession;Lio/trino/spi/Page;)Lio/trino/operator/project/SelectedPositions;+61
J 97113 c2 io.trino.operator.project.PageProcessor.createWorkProcessor(Lio/trino/spi/connector/ConnectorSession;Lio/trino/operator/DriverYieldSignal;Lio/trino/memory/context/LocalMemoryContext;Lio/trino/operator/project/PageProcessorMetrics;Lio/trino/spi/Page;Z)Lio/trino/operator/WorkProcessor; (242 bytes) @ 0x0000ffff87116a2c [0x0000ffff87116500+0x000000000000052c]
J 97114 c2 io.trino.operator.ScanFilterAndProjectOperator$SplitToPages$$Lambda$4237+0x000000c002614000.apply(Ljava/lang/Object;)Ljava/lang/Object; (16 bytes) @ 0x0000ffff87115bb4 [0x0000ffff87115b40+0x0000000000000074]
J 53192 c2 io.trino.operator.WorkProcessorUtils$$Lambda$3603+0x000000c0022f4690.process(Ljava/lang/Object;)Lio/trino/operator/WorkProcessor$TransformationState; (9 bytes) @ 0x0000ffff817e298c [0x0000ffff817e2940+0x000000000000004c]
J 116578 c2 io.trino.operator.WorkProcessorUtils$3.process()Lio/trino/operator/WorkProcessor$ProcessState; (226 bytes) @ 0x0000ffff899b08f0 [0x0000ffff899b06c0+0x0000000000000230]
J 116578 c2 io.trino.operator.WorkProcessorUtils$3.process()Lio/trino/operator/WorkProcessor$ProcessState; (226 bytes) @ 0x0000ffff899b077c [0x0000ffff899b06c0+0x00000000000000bc]
J 116578 c2 io.trino.operator.WorkProcessorUtils$3.process()Lio/trino/operator/WorkProcessor$ProcessState; (226 bytes) @ 0x0000ffff899b077c [0x0000ffff899b06c0+0x00000000000000bc]
J 37980 c2 io.trino.operator.WorkProcessorUtils$BlockingProcess.process()Lio/trino/operator/WorkProcessor$ProcessState; (75 bytes) @ 0x0000ffff7fb39e6c [0x0000ffff7fb39dc0+0x00000000000000ac]
J 31748 c2 io.trino.operator.WorkProcessorUtils$$Lambda$3605+0x000000c0022f5130.process(Ljava/lang/Object;)Lio/trino/operator/WorkProcessor$TransformationState; (8 bytes) @ 0x0000ffff7f35eaa0 [0x0000ffff7f35ea00+0x00000000000000a0]
J 116578 c2 io.trino.operator.WorkProcessorUtils$3.process()Lio/trino/operator/WorkProcessor$ProcessState; (226 bytes) @ 0x0000ffff899b08f0 [0x0000ffff899b06c0+0x0000000000000230]
J 116578 c2 io.trino.operator.WorkProcessorUtils$3.process()Lio/trino/operator/WorkProcessor$ProcessState; (226 bytes) @ 0x0000ffff899b077c [0x0000ffff899b06c0+0x00000000000000bc]
J 40576 c2 io.trino.operator.WorkProcessorUtils$$Lambda$4067+0x000000c0023c8000.process()Lio/trino/operator/WorkProcessor$ProcessState; (12 bytes) @ 0x0000ffff7ffc3064 [0x0000ffff7ffc2fc0+0x00000000000000a4]
J 40603 c2 io.trino.operator.WorkProcessorUtils$$Lambda$4069+0x000000c0023c8450.process()Lio/trino/operator/WorkProcessor$ProcessState; (12 bytes) @ 0x0000ffff7ffe03c0 [0x0000ffff7ffe0300+0x00000000000000c0]
J 40575 c2 io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput()Lio/trino/spi/Page; (41 bytes) @ 0x0000ffff7ffc2264 [0x0000ffff7ffc21c0+0x00000000000000a4]
J 35196 c2 io.trino.operator.Driver.processInternal(Lio/trino/operator/OperationTimer;)Lcom/google/common/util/concurrent/ListenableFuture; (667 bytes) @ 0x0000ffff7f78ca78 [0x0000ffff7f78c640+0x0000000000000438]
J 65796 c2 io.trino.operator.Driver.lambda$process$8(JI)Lcom/google/common/util/concurrent/ListenableFuture; (266 bytes) @ 0x0000ffff82e52cb4 [0x0000ffff82e52880+0x0000000000000434]
J 62786 c2 io.trino.operator.Driver$$Lambda$3696+0x000000c002303a48.get()Ljava/lang/Object; (16 bytes) @ 0x0000ffff81ae3c88 [0x0000ffff81ae3c40+0x0000000000000048]
J 37105 c2 io.trino.operator.Driver.process(Lio/airlift/units/Duration;I)Lcom/google/common/util/concurrent/ListenableFuture; (93 bytes) @ 0x0000ffff7f8f9750 [0x0000ffff7f8f9180+0x00000000000005d0]
J 37263 c2 io.trino.operator.Driver.processForDuration(Lio/airlift/units/Duration;)Lcom/google/common/util/concurrent/ListenableFuture; (9 bytes) @ 0x0000ffff7f9304fc [0x0000ffff7f9304c0+0x000000000000003c]
J 105866 c2 io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(Lio/airlift/units/Duration;)Lcom/google/common/util/concurrent/ListenableFuture; (84 bytes) @ 0x0000ffff87ff0d74 [0x0000ffff87ff0c40+0x0000000000000134]
J 50352 c2 io.trino.execution.executor.PrioritizedSplitRunner.process()Lcom/google/common/util/concurrent/ListenableFuture; (355 bytes) @ 0x0000ffff8058762c [0x0000ffff80586d00+0x000000000000092c]
J 64685% c2 io.trino.execution.executor.TaskExecutor$TaskRunner.run()V (621 bytes) @ 0x0000ffff82bf5fc4 [0x0000ffff82bf5880+0x0000000000000744]
j  io.trino.$gen.Trino_414_stripe_6____20231017_085628_2.run()V+4
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@17.0.8.1
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@17.0.8.1
j  java.lang.Thread.run()V+11 java.base@17.0.8.1
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7c1c74]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x244
V  [libjvm.so+0x7c3280]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x180
V  [libjvm.so+0x874ce0]  thread_entry(JavaThread*, JavaThread*)+0x70
V  [libjvm.so+0xdb0aa8]  JavaThread::thread_main_inner()+0xa8
V  [libjvm.so+0xdb5778]  Thread::call_run()+0xb8
V  [libjvm.so+0xb45734]  thread_native_entry(Thread*)+0xdc
C  [libpthread.so.0+0x7624]  start_thread+0x184

To Reproduce

Steps and (source) code to reproduce the behavior. Unable to reproduce consistently

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Platform information

OS: Amazon Linux 2
Version "Corretto-17.0.9.8.1

Additional context

Add any other context about the problem here.

For VM crashes, please attach the error report file. By default the file name is hs_err_pidpid.log, where pid is the process ID of the process.

eastig commented 10 months ago

Hi @zp-stripe, Thank you for reporting the issue. Could you please attach hs_err_*.log file if you have them? In the provided stack trace I see java.base@17.0.8.1 which mean Corretto 17.0.8. However in Platform information the specified version is 17.0.9. Could you please check the crash happens on Corretto 17.0.9.8.1?

Thanks

zp-stripe commented 10 months ago

I got that version from running java --version as mentioned in the prompt:

zp@host:~$ java --version
openjdk 17.0.9 2023-10-17 LTS
OpenJDK Runtime Environment Corretto-17.0.9.8.1 (build 17.0.9+8-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.9.8.1 (build 17.0.9+8-LTS, mixed mode, sharing)

Here's another log snippet that confirms the version:

[2023-11-14 22:07:48.227082] # A fatal error has been detected by the Java Runtime Environment:
[2023-11-14 22:07:48.227090] #
[2023-11-14 22:07:48.227107] # SIGSEGV (0xb) at pc=0x0000000000000000, pid=2398, tid=4130
[2023-11-14 22:07:48.227114] #
[2023-11-14 22:07:48.227127] # JRE version: OpenJDK Runtime Environment Corretto-17.0.9.8.1 (17.0.9+8) (build 17.0.9+8-LTS)
[2023-11-14 22:07:48.227163] # Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.9.8.1 (17.0.9+8-LTS, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-aarch64)
[2023-11-14 22:07:48.227172] # Problematic frame:
[2023-11-14 22:07:48.227180] # C 0xfffffffffffffffc

maybe the minor version changed recently, not sure.

I don't have an hs_err_ file on hand right now because the hosts were replaced, but I can try to get one soon when the problem reoccurs and I will attach it here. Thanks.

zp-stripe commented 10 months ago

hs_err_pid2392.log

Here is the hs_err file

eastig commented 10 months ago

@zp-stripe According to the hs_err file, you are using Trino-414. On https://github.com/trinodb/trino I see the latest version is 433. Could you please check if the crash happens on the version 433?

simonis commented 10 months ago

The crashes you've reported are all on aarch64. Have you also observed them on x86_64 or are you running exclusively on aarch64?

feser commented 10 months ago

We got SIGSEGV after upgrading to 17.0.9.8.1. Do you think it is relevant to this issue or async profiler issue which was supposed to be fixed with 17.0.9?

Unfortunately, I can not get the hs_err_pid1.log.


[error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007f414ee0623b]

#

# https://github.com/corretto/corretto-17/issues/

# If you would like to submit a bug report, please visit:

#

# /tmp/hs_err_pid1.log

# An error report file with more information is saved as:

#

# The JFR repository may contain useful JFR files. Location: /tmp/2023_11_30_12_57_36_1

#

# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to //core.1)

#

# V [libjvm.so+0x253889] forte_fill_call_trace_given_top(JavaThread*, ASGCT_CallTrace*, int, frame) [clone .isra.20]+0x15d

# Problematic frame:

# Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.9.8.1 (17.0.9+8-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)

# JRE version: OpenJDK Runtime Environment Corretto-17.0.9.8.1 (17.0.9+8) (build 17.0.9+8-LTS)

#

# SIGSEGV (0xb) at pc=0x00007f414da20889, pid=1, tid=578

#

# A fatal error has been detected by the Java Runtime Environment:

#
benty-amzn commented 10 months ago

Unfortunately, if that's the full output available and we don't have the hs_err, it's nearly impossible to say. That log doesn't specify where the crash occurred