Closed JasonFengJ9 closed 2 years ago
@knn-k fyi
I ran a 10x Grinder job at internal job/Grinder/20821/tapResults/. No failures.
Reproduced in another Grinder job: job/Grinder/20822/consoleText
@mikezhang1234567890 can you take a look at this. We have a 256K stack size for aarch64, can you see if a 512k default size solves this?
I'll rerun the grinder with -Xmso512K
to see if the error still occurs.
Had a stack overflow failure with -Xmso512K
, in internal Jenkins: /job/Grinder/20844/consoleText
Looks like the Jenkins job doesn't pass the command line arguments to the perf test. Will retry by creating a new build with default set to 512K.
Using #define J9_OS_STACK_SIZE (1024 * 1024)
a 40x grinder ran internally and had no failures at job/Grinder/20902/
Using 512k did have a stack overflow.
@tajila should I make the change to set J9_OS_STACK_SIZE
to 1024K for aarch64?
@knn-k Is there a reason why aarch64 would require larger stack space?
Only z/OS has such a large default size, as I recall it's because the OS provides that size even if we specify a smaller one. https://www.eclipse.org/openj9/docs/openj9_defaults/
@tajila I have no idea on the reason for large stack on AArch64. I don't understand why this failure occurs intermittently, either.
@knn-k
Here are the volatile register we save on x86
rax
rcx
rdx
//RSI not volatile in C but is a JIT helper argument register
rsi
r8
r9
r10
r11
xmm0
xmm1
xmm2
xmm3
xmm4
xmm5
//total (8*8) + (6*16) = 160bytes
on aarch64 its
x0 ... x18
d16 ... d31
//total (19*8) + (16*8) = 268bytes
So this would warrant a larger native stack than x86, but 1M seems extreme.
@mikezhang1234567890 do you have a core file from when it SO's? it would be interesting to see the !stacklsots [thread]
for the thread that overflows. Know how many jit or native transitions are performed would help us identify the cause
No core yet, I'll run another grinder and get one.
@tajila I've attached the output for !stackslots
and !j9vmthread
on the thread that gets the stack overflow.
Based on that output it looks like a Java stack overflow rather than a OS stack overflow. @mikezhang1234567890 can you try it with a larger -Xss, keeping the original default OS stack size
Since https://www.eclipse.org/openj9/docs/openj9_defaults/ says the default is 1024K, I tried -Xss2M
and didn't get a stack overflow error in 40 runs.
After looking at the diagnostics in https://github.com/eclipse-openj9/openj9/issues/14455#issuecomment-1040474440 it appears that there are a lot of arrays pinned on the java stack. This contributes to a lot of the java stack space usage.
I think someone from the JIT team should look at whether this is happeing from frequently on aarch64 as opposed to other platforms.
@0xdaryl
Looking at the jitdump, I found java/io/ObjectOutputStream.writeObject0(Ljava/lang/Object;Z)V
had 178 locals and stack frame size of the method was 1552 bytes. The method was called recursively 489 times according to javacore.
I am not sure if this method has many locals as AArch64 on other platforms.
AArch64 does not implement locals compaction[1] and allocate 8-byte space even for 4-byte locals. It seems that x and z do locals compaction and spend 4 bytes for 4-byte locals. I think Power does the same as AArch64 because we referred to the power code when implementing AArch64 codegen.
Locals compaction has been implemented and enabled on AArch64 by https://github.com/eclipse/omr/pull/6387 and related PRs. Thanks to @Akira1Saitoh
Locals compaction has been implemented and enabled on AArch64
I'd like this change to soak in master for a while. Bugs here can be kind of subtle. I think it can wait for 0.33.
Are all the changes merged in the head stream? The OMR change is promoted, not sure what the other PRs are.
All the changes have been merged in the head stream.
Does locals compaction resolve this issue?
50x Grinder run passed with the personal build of the head stream. (/job/Grinder/21693/
)
Closing since the problem is resolved.
Failure link
From an internal build
job/Test_openjdk8_j9_extended.perf_aarch64_linux/44/
(cent7-aarch64-6
):Rerun in Grinder - Change TARGET to run only the failed test targets.
Optional info
Failure output (captured from console output)
This is JDK8 head stream, appears
aarch64
specific unlike https://github.com/eclipse-openj9/openj9/issues/12219. fyi @tajila