ThePrez / ServiceCommander-IBMi

Service Commander for IBM i
Apache License 2.0
40 stars 12 forks source link

[BUG] Service Commander crashes (Java dump) after move to Power10 #211

Open chrjorgensen opened 1 year ago

chrjorgensen commented 1 year ago

Describe the bug

We've just finished moving some of our LPAR's from a Power9 server to Power10, and now Service Commander creates a Java dump.

When running the command sc list we get the following output:

→ sc list
Unhandled exception
Type=Illegal instruction vmState=0x00000000
J9Generic_Signal_Number=00000048 Signal_Number=00000004 Error_Value=00000000 Signal_Code=0000001e
Handler1=08001000A003AC08 Handler2=08001000A0011ED0
R0=0000000000000028 R1=000000018005B7E0 R2=0000000000000010 R3=00000007FFF00258
R4=00000007FFF078D8 R5=0000000000000038 R6=0000000000000000 R7=7FFFFFFF00000000
R8=0700000025BE78E8 R9=0000000008000000 R10=0000000000000001 R11=0000000030027230
R12=0000000024008842 R13=0000000180067800 R14=0000000030027700 R15=0000000030027B00
R16=0000000000000000 R17=FFFFFFFFFFFFFFFF R18=090000000AD23A70 R19=000000018005BC08
R20=000000018005BA40 R21=09001000A095AB10 R22=090000000AA9CD04 R23=09001000A095AB10
R24=000000018005BB00 R25=09001000A095AB10 R26=00000001835DD658 R27=000000018005BBC8
R28=00000001800DEC10 R29=0000000000000000 R30=00000007FFF078D8 R31=000000000000000A
IAR=0000000183D2A910 LR=0000000183B2C8BC MSR=800000002288F032 CTR=0000000183B2C6CC
CR=0000000004000000 FPSCR=8200000000000000 XER=0400000082000000
FPR0 0000000082000000 (f: 2181038080.000000, d: 1.077576e-314)
FPR1 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
FPR2 41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR5 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
FPR6 bfe0000000000000 (f: 0.000000, d: -5.000000e-01)
FPR7 412e848000000000 (f: 0.000000, d: 1.000000e+06)
FPR8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR9 4530000000000000 (f: 0.000000, d: 1.934281e+25)
FPR10 412e848000000000 (f: 0.000000, d: 1.000000e+06)
FPR11 43300000000f4240 (f: 1000000.000000, d: 4.503600e+15)
FPR12 4530000000000000 (f: 0.000000, d: 1.934281e+25)
FPR13 0000000000000001 (f: 1.000000, d: 4.940656e-324)
FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)

Compiled_method=unknown (In JIT code segment 0000000183A27938 but no method found)
Target=2_90_20220209_53 (OS400 7.3)
CPU=ppc64 (48 logical CPUs) (0x12100000000 RAM)
----------- Stack Backtrace -----------
initializeAttachedThreadImpl+0x3d4 (0x090000000927B1D8 [libj9vm29.so+0x6b1d8])
initializeAttachedThread+0x98 (0x0900000009281D9C [libj9vm29.so+0x71d9c])
standardInit+0x714 (0x090000000B1522F8 [libjclse29.so+0x462f8])
J9VMDllMain+0x33c (0x090000000B19F560 [libjclse29.so+0x93560])
runJ9VMDllMain@AF84_52+0x108 (0x0900000009227A6C [libj9vm29.so+0x17a6c])
pool_do+0x108 (0x090000000923698C [libj9vm29.so+0x2698c])
IPRA.$runInitializationStage+0xd8 (0x09000000092246DC [libj9vm29.so+0x146dc])
protectedInitializeJavaVM+0xe60 (0x090000000921AEE4 [libj9vm29.so+0xaee4])
omrsig_protect+0x4a0 (0x0900000009522584 [libj9prt29.so+0x57584])
initializeJavaVM+0x21c (0x0900000009219AA0 [libj9vm29.so+0x9aa0])
J9_CreateJavaVM+0xb8 (0x090000000923C33C [libj9vm29.so+0x2c33c])
JNI_CreateJavaVM_impl+0x948 (0x09000000091D200C [libjvm.so+0x2800c])
JNI_CreateJavaVM+0x1c4 (0x0900000009197E88 [libjvm.so+0x1ce88])
JNI_CreateJavaVM+0x1c4 (0x0900000009168E88 [libjvm.so+0x1ce88])
JavaMain+0xec (0x00000001000067D0 [java+0x67d0])
ThreadJavaMain+0xc (0x000000010000D750 [java+0xd750])
_pthread_body+0x100 (0x0900000000328E24 [libpthreads.a+0x3e24])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2023/03/18 18:23:14 - please wait.
JVMDUMP032I JVM requested System dump using '/home/<user>/core.20230318.182314.862.0001.dmp' in response to an event
Note: "Enable full CORE dump" in smit is set to FALSE and as a result there will be limited threading information in core file.
JVMDUMP010I System dump written to /home/<user>/core.20230318.182314.862.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/<user>/javacore.20230318.182314.862.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/<user>/javacore.20230318.182314.862.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/<user>/Snap.20230318.182314.862.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/<user>/Snap.20230318.182314.862.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/<user>/jitdump.20230318.182314.862.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'main' thread 0x0000000030027B00
JVMDUMP010I JIT dump written to /home/<user>/jitdump.20230318.182314.862.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".

Uninstalling the OpenJDK 11 EA from yum was not an option, since this also removed Service Commander, which has OpenJDK as a requirement.

The server move was done by creating a VM on Power10 to use the same storage as the VM on Power9. So the storage was not moved, only the CPU (and instruction set) was changed.

Please advice - I've never seen this before!

To Reproduce

Expected behavior

Verbose output Run the command with the -v option and share the output here.

Additional context Add any other context about the problem here.

ThePrez commented 1 year ago

Will take a look. Might be a missing PTF. What IBM i release are you running?

chrjorgensen commented 1 year ago

It's IBM i 7.3.

Java 11 is installed as licensed program. Is it required to have both - and why?

Latest Java PTF group is installed.

ThePrez commented 1 year ago

Service Commander requires OpenJDK 11's RPM distribution, which is standalone from JV1. JV1 Java 11 is not required, and Java PTF levels do not affect the RPM distribution. The "why?" seems a common question. I have the answer documented in the code here but we should probably also put this type of information in the mainline doc somewhere.

Are you up-to-date with the latest cumulative PTF packages?

I suspect one or both of the following is needed:

I'll aim to have more details in the next few days. Otherwise, we may need you to send in the resulting dumps from the Java crashes.

chrjorgensen commented 1 year ago

Thanks for explaining and looking into this... I'm fully up-to-date with CUM and group PTF packages (like to be on the frontline 😄), so I don't miss any group PTF or yum updates - for 7.3, haven't tested this on 7.5, but will tomorrow.

ThePrez commented 1 year ago

I'm fully up-to-date with CUM and group PTF packages (like to be on the frontline 😄)

I expected nothing less 🙂

There are certain instructions that are not supported on i 7.3. I believe you need a newer OpenJDK build to handle this properly.

In the meantime, environment variable JAVA_TOOL_OPTIONS=-Xint should provide a temporary workaround by disabling the JIT compiler.

chrjorgensen commented 1 year ago

I expected nothing less 🙂

You know me 😉

I can confirm that setting environment variable JAVA_TOOL_OPTIONS=-Xint indeed does work. I have set the variable at the system level and can now use Service Commander with this workaround. 👌

I have also installed Service Commander and OpenJDK 11 on my IBM i 7.5 sandbox, and it works here without problems.

ThePrez commented 1 year ago

Oops, closed wrong issue

Please remove your workaround and try

yum clean expire-cache && yum upgrade openjdk-11
chrjorgensen commented 1 year ago

I have local repositories, mirrored from the IBM repositories, so my version of your yum commands is this:

yum clean expire-cache  --enablerepo="ibmi*" --disablerepo="m02*" && yum upgrade openjdk-11 --enablerepo="ibmi*" --disablerepo="m02*"

But I get no updates...

Loaded plugins: changelog
Cleaning repos: ibmi-base ibmi-base-unsecure ibmi-release ibmi-release-unsecure the-i-doctor
5 metadata files removed
Loaded plugins: changelog
ibmi-base                                                                                                                                | 3.6 kB  00:00:00
ibmi-base-unsecure                                                                                                                       | 3.6 kB  00:00:00
ibmi-release                                                                                                                             | 2.9 kB  00:00:00
ibmi-release-unsecure                                                                                                                    | 2.9 kB  00:00:00
the-i-doctor                                                                                                                             | 2.9 kB  00:00:00
Setting up Upgrade Process
No Packages marked for Update

I then tried reinstall instead, which forced it to download and install:

→ yum clean expire-cache  --enablerepo="ibmi*" --disablerepo="m02*" && yum reinstall openjdk-11 --enablerepo="ibmi*" --disablerepo="m02*"
Loaded plugins: changelog
Cleaning repos: ibmi-base ibmi-base-unsecure ibmi-release ibmi-release-unsecure the-i-doctor
5 metadata files removed
Loaded plugins: changelog
Setting up Reinstall Process
ibmi-base                                                                                                                                | 3.6 kB  00:00:00
ibmi-base-unsecure                                                                                                                       | 3.6 kB  00:00:00
ibmi-release                                                                                                                             | 2.9 kB  00:00:00
ibmi-release-unsecure                                                                                                                    | 2.9 kB  00:00:00
the-i-doctor                                                                                                                             | 2.9 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package openjdk-11-ea.ppc64 0:11.0.14.9-1 will be reinstalled
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================
 Package                                  Arch                             Version                                   Repository                            Size
================================================================================================================================================================
Reinstalling:
 openjdk-11-ea                            ppc64                            11.0.14.9-1                               ibmi-base                            138 M

Transaction Summary
================================================================================================================================================================
Reinstall     1 Package

Total download size: 138 M
Installed size: 467 M
Is this ok [y/N]: y
Downloading Packages:
openjdk-11-ea-11.0.14.9-1.ibmi7.3.ppc64.rpm                                                                                              | 138 MB  00:00:29
Running Transaction Check
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : openjdk-11-ea-11.0.14.9-1.ppc64                                                                                                              1/1

Installed:
  openjdk-11-ea.ppc64 0:11.0.14.9-1

Complete!

But it is still the old version - Java version output:

→ /QOpenSys/pkgs/lib/jvm/openjdk-11/bin/java -version
openjdk version "11.0.14-ea" 2022-01-18
OpenJDK Runtime Environment (build 11.0.14-ea+9)
Eclipse OpenJ9 VM (build openj9-0.30.0, JRE 11 OS400 ppc64-64-Bit Compressed References 20220209_53 (JIT disabled, AOT disabled)
OpenJ9   - 9dccbe0
OMR      - dac962a
JCL      - 01f8665 based on jdk-11.0.14+9)
ThePrez commented 1 year ago

Oops!! Sorry. Please try again in 6 hours :D you should install 11.0.15.10

chrjorgensen commented 1 year ago

Much better!

→ /QOpenSys/pkgs/lib/jvm/openjdk-11/bin/java -version
openjdk version "11.0.15-ea" 2022-04-19
OpenJDK Runtime Environment (build 11.0.15-ea+10)
Eclipse OpenJ9 VM (build openj9-0.32.0, JRE 11 OS400 ppc64-64-Bit Compressed References 20230320_202 (JIT enabled, AOT enabled)
OpenJ9   - 9a84ec3
OMR      - ab24b66
JCL      - b7b5b42 based on jdk-11.0.15+10)

And Service Commander now works without JIT disabled:

→ sc list
bifmon (Bifrost Monitors)
crond (Cron daemon)
ibmidash (IBM i Dashboard)

And the new Java RPM package is being picked by my mirrors of the IBM repositories and will be deployed to the other IBM i 7.3 instances - nice!

Thanks for quick response!

Btw, the current Java version 11.0.15.10 is in the base repository but the previous version 11.0.14.9 is still in the release repositories - I assume they will eventually be removed (even though yum will find the correct version) to avoid any confusion?

billede

rcronin commented 1 week ago

I'm seeing this same issue on 7.5. JDK 11 is installed as the same as above and still experiencing the segmentation vault. In fact, seeing the segmentation fault when doing a java --version.

Any advice?

chrjorgensen commented 1 week ago

@rcronin Here is the output from Java11 on our IBM i 7.5 (Power9) server:

→ /QOpenSys/pkgs/lib/jvm/openjdk-11/bin/java --version
openjdk 11.0.15-ea 2022-04-19
OpenJDK Runtime Environment (build 11.0.15-ea+10)
Eclipse OpenJ9 VM (build openj9-0.32.0, JRE 11 OS400 ppc64-64-Bit Compressed References 20230320_202 (JIT enabled, AOT enabled)
OpenJ9   - 9a84ec3
OMR      - ab24b66
JCL      - b7b5b42 based on jdk-11.0.15+10)

Are you up to date on PTF groups?

rcronin commented 1 week ago

We are on SF99959.

Here's what I'm getting:

bash-5.2$ /QOpenSys/pkgs/lib/jvm/openjdk-11/bin/java --version
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000033
Handler1=08001000A003AD10 Handler2=08001000A00113B8
R0=090000001042EA00 R1=00000001800A9DE0 R2=09001000A0E50ED0 R3=09001000A0E50440
R4=0000000000017F54 R5=000000000000001C R6=09001000A0E50770 R7=000000007B31F51F
R8=0000000000000000 R9=00C6417E28711C40 R10=0000000319012190 R11=0000000000000000
R12=09001000A00830F0 R13=00000001800B7800 R14=09001000A0DB60E0 R15=0000000180128790
R16=0000000000400000 R17=0000000000000011 R18=08001000A00180B8 R19=0000000000000000
R20=00000001800AB500 R21=00000001802F8238 R22=00000001800A9FD0 R23=0000000000000000
R24=0000000000000016 R25=00000001800AB208 R26=000000000000001C R27=09001000A0E50440
R28=00000001800A9FD0 R29=0000000000007D45 R30=0000000000007D45 R31=00000001800A9FD0
IAR=090000001042D01C LR=090000001042CBC8 MSR=800000000000F032 CTR=090000001042EA00
CR=240082440000000C FPSCR=8202000000000000 XER=0000000C82020000
FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR1 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
FPR2 41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR5 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
FPR6 43e5886dcc4beba0 (f: 3427527680.000000, d: 1.241289e+19)
FPR7 412e848000000000 (f: 0.000000, d: 1.000000e+06)
FPR8 408c180000000000 (f: 0.000000, d: 8.990000e+02)
FPR9 4530000000000000 (f: 0.000000, d: 1.934281e+25)
FPR10 412e848000000000 (f: 0.000000, d: 1.000000e+06)
FPR11 43300000000f4240 (f: 1000000.000000, d: 4.503600e+15)
FPR12 4530000000000000 (f: 0.000000, d: 1.934281e+25)
FPR13 412e848000000000 (f: 0.000000, d: 1.000000e+06)
FPR14 0000000082000000 (f: 2181038080.000000, d: 1.077576e-314)
FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/QOpenSys/pkgs/lib/jvm/openjdk-11/lib/libjimage.so
Module_base_address=090000001042A000
Target=2_90_20230320_202 (OS400 7.5)
CPU=ppc64 (40 logical CPUs) (0xc300000000 RAM)
----------- Stack Backtrace -----------
find_location_index__15ImageFileReaderCFPCcPUL+0x84 (0x090000001042CBC8 [libjimage.so+0x2bc8])
IPRA.$JIMAGE_FindResource@AF7_4+0x8c (0x09000000104302F0 [libjimage.so+0x62f0])
JIMAGE_FindResource+0xac (0x090000001042FEB0 [libjimage.so+0x5eb0])
jimageFindResource+0x198 (0x090000000E5D795C [libj9vm29.so+0x17f95c])
IPRA.$readFileFromJImage+0x10c (0x090000000E5D6B10 [libj9vm29.so+0x17eb10])
IPRA.$searchClassInCPEntry+0x2a8 (0x090000000E5D5C8C [libj9vm29.so+0x17dc8c])
IPRA.$searchClassInModule+0xa4 (0x090000000E5D54E8 [libj9vm29.so+0x17d4e8])
findLocallyDefinedClass+0x1d4 (0x090000000E5D4BD8 [libj9vm29.so+0x17cbd8])
IPRA.$callFindLocallyDefinedClass+0x1d4 (0x090000000E49CEB8 [libj9vm29.so+0x44eb8])
IPRA.$attemptDynamicClassLoad+0xc8 (0x090000000E49D14C [libj9vm29.so+0x4514c])
internalFindClassInModule+0x2eb8 (0x090000000E49B79C [libj9vm29.so+0x4379c])
internalFindKnownClass+0xc4 (0x090000000E49D588 [libj9vm29.so+0x45588])
initializeRequiredClasses+0x1bc (0x09000000103A5700 [libjclse29.so+0x43700])
standardInit+0x5d0 (0x09000000103A6CB4 [libjclse29.so+0x44cb4])
J9VMDllMain+0x33c (0x09000000103F44E0 [libjclse29.so+0x924e0])
runJ9VMDllMain@AF84_52+0x108 (0x090000000E46FC8C [libj9vm29.so+0x17c8c])
pool_do+0x108 (0x090000000E47EB8C [libj9vm29.so+0x26b8c])
IPRA.$runInitializationStage+0xd8 (0x090000000E46C8FC [libj9vm29.so+0x148fc])
protectedInitializeJavaVM+0xe60 (0x090000000E462EE4 [libj9vm29.so+0xaee4])
omrsig_protect+0x4a0 (0x090000000E768C84 [libj9prt29.so+0x56c84])
initializeJavaVM+0x21c (0x090000000E461AA0 [libj9vm29.so+0x9aa0])
J9_CreateJavaVM+0xb8 (0x090000000E48453C [libj9vm29.so+0x2c53c])
JNI_CreateJavaVM_impl+0x960 (0x090000000E41A044 [libjvm.so+0x28044])
JNI_CreateJavaVM+0x1c4 (0x090000000E3DFE88 [libjvm.so+0x1ce88])
JNI_CreateJavaVM+0x1c4 (0x090000000E3B0E88 [libjvm.so+0x1ce88])
JavaMain+0x100 (0x0000000100006704 [java+0x6704])
ThreadJavaMain+0xc (0x000000010000D510 [java+0xd510])
_pthread_body+0xf8 (0x090000000032805C [libpthreads.a+0x405c])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2024/09/03 16:56:14 - please wait.
JVMDUMP032I JVM requested System dump using '/home/RCRONIN/core.20240903.165614.27495.0001.dmp' in response to an event
Note: "Enable full CORE dump" in smit is set to FALSE and as a result there will be limited threading information in core file.
JVMDUMP010I System dump written to /home/RCRONIN/core.20240903.165614.27495.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/RCRONIN/javacore.20240903.165614.27495.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/RCRONIN/javacore.20240903.165614.27495.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/RCRONIN/Snap.20240903.165614.27495.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/RCRONIN/Snap.20240903.165614.27495.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/RCRONIN/jitdump.20240903.165614.27495.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in '(unnamed thread)' thread 0x0000000030021600
JVMDUMP010I JIT dump written to /home/RCRONIN/jitdump.20240903.165614.27495.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".