eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

java -Xshareclasses -version fails with JVMSHRC829E if HOME is not set with Open 0.38.0 #17471

Closed avermeer closed 1 year ago

avermeer commented 1 year ago

I found a surprising difference between Java 17.0.6 (OpenJ9 0.36.0) and Java 17.0.7 (OpenJ9 0.38.0) in a startup shell trying to launch a Java process with -shared option and HOME environment variable being not set.

It looks likes a small regression, easy to by-pass (just export HOME=), but I'd be interested to get OpenJ9 developer's view on it.

I made a small case to reproduce it with this shell in /tmp/test.sh:

#!/bin/bash 
unset HOME 
java -Xshareclasses -version 

On acme1 machine (Linux CentOS 7.9 x64 with IBM Semeru JDK 17.0.6 installed), everything runs fine:

[root@acme1 ~]# /tmp/test.sh
openjdk version "17.0.6" 2023-01-17
IBM Semeru Runtime Open Edition 17.0.6.0 (build 17.0.6+10)
Eclipse OpenJ9 VM 17.0.6.0 (build openj9-0.36.0, JRE 17 Linux amd64-64-Bit Compressed References 20230117_397 (JIT enabled, AOT enabled)
OpenJ9   - e68fb241f
OMR      - f491bbf6f
JCL      - 927b34f84c8 based on jdk-17.0.6+10)

On acme2 machine (RockyLinux 8.7 x64, with IBM Semeru JDK 17.0.7 installed), the same sample shell fails:

[root@acme2 ~]# /tmp/test.sh 
JVMSHRC829E Failed to use user's home as the default shared cache directory. Cannot get home directory. Please set another directory via environment variable "HOME" or command line option "cacheDir=", or fix the home directory in the password file entry. 
JVMSHRC840E Failed to start up the shared cache. 
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed 
Error: Could not create the Java Virtual Machine. 
Error: A fatal exception has occurred. Program will exit. 

Note, on acm2 machine, we have:

[root@acme2 ~]# java -version 
openjdk version "17.0.7" 2023-04-18 
IBM Semeru Runtime Open Edition 17.0.7.0 (build 17.0.7+7) 
Eclipse OpenJ9 VM 17.0.7.0 (build openj9-0.38.0, JRE 17 Linux amd64-64-Bit Compressed References 20230418_480 (JIT enabled, AOT enabled) 
OpenJ9 - d57d05932 
OMR - 855813495 
JCL - 9d7a231edbc based on jdk-17.0.7+7) 

Any thought on why this need of HOME suddenly became mandatory ?

Thanks, Alex

pshipton commented 1 year ago

There was a change in 0.36, https://www.eclipse.org/openj9/docs/version0.36/#changes-to-the-location-of-the-default-directory-for-the-shared-cache-and-snapshot I don't know why anything would have changed between 0.36 and 0.38.

@hangshao0 pls take a look to see if there is anything to be done.

pshipton commented 1 year ago

Is there a configuration problem on the machine? The error message does suggest to "fix the home directory in the password file entry".

hangshao0 commented 1 year ago

I am not aware of any change between 0.36 and 0.38 related to this. I see you are testing on 2 machines. Could you also try IBM Semeru JDK 17.0.7 on acme1 machine ?

avermeer commented 1 year ago

It's not OS/machine-related: on the acme1 machine running with CentOS 7.9, if I download IBM Semeru JDK 17.0.7 and use it to execute the test, I get the same failure as on acme2 machine running with Rocky Linux 8.7, see:

[root@acme1 ~]# cd /mnt
[root@acme1 mnt]# wget https://github.com/ibmruntimes/semeru17-binaries/releases/download/jdk-17.0.7%2B7_openj9-0.38.0/ibm-semeru-open-jdk_x64_linux_17.0.7_7_openj9-0.38.0.tar.gz
[root@acme1 mnt]# tar zxvf ibm-semeru-open-jdk_x64_linux_17.0.7_7_openj9-0.38.0.tar.gz
[root@acme1 mnt]# export PATH=/mnt/jdk-17.0.7+7/bin:$PATH
[root@acme1 mnt]# /tmp/test.sh

leads to same failure output as on acme2 machine:

JVMSHRC829E Failed to use user's home as the default shared cache directory. Cannot get home directory. Please set another directory via environment variable "HOME" or command line option "cacheDir=", or fix the home directory in the password file entry.
JVMSHRC840E Failed to start up the shared cache.
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

It really looks like -Xshareclasses option which used not to require HOME environment variable to be set is now making it mandatory with OpenJ9 0.38.0.

Not a big deal, but probably worth a line in release notes so people upgrading won't be surprised (before opening this incident, I search for JVMJ9VM009E error code using various Internet search engines, and I found zero hit, so hopefully this page will help anyone falling into the same trap...)

avermeer commented 1 year ago

Edit: I found hits with JVMJ9VM009E , but unrelated to the fact that HOME became required, like at : https://www.ibm.com/support/pages/profile-server-fails-jvmj9ti001e-error

hangshao0 commented 1 year ago

On v0.38:

unset HOME 
java -Xshareclasses -Xtrace:print={j9prt} -version
14:00:22.257*0x17000           j9prt.787      > j9shmem_getDir: Entering
14:00:22.257 0x17000           j9prt.1637   * - j9shmem_getDir: omrsysinfo_get_env() failed to get environment variable HOME
14:00:22.257 0x17000           j9prt.1648   * - j9shmem_getDir: not CRIU final restore, skip getpwuid(), and homeDir is NULL.
14:00:22.257 0x17000           j9prt.788      < j9shmem_getDir: Exiting with buffer=

On v0.36, we try using getpwuid() to get the home directory if "HOME" is not set. However, after this change, I see getpwuid() is skipped because of CRIU:

https://github.com/eclipse-openj9/openj9/commit/21dda75a18f75636fda996e53aaf4c0316527ab9

@JasonFengJ9

JasonFengJ9 commented 1 year ago

getpwuid() should be skipped if CRIU is enabled and not finalRestore, will open a PR for it.

avermeer commented 1 year ago

Okay this issue is marked as closed, but its conclusion is not clear to me: is the new behavior which i mentioned in 0.38 fixed to behave like 0.36, or is the fix for something else?

tajila commented 1 year ago

is the new behavior which i mentioned in 0.38 fixed to behave like 0.36, or is the fix for something else?

The new behaviour is a regression. It has been changed back to the 0.36 behaviour. This will be released in 0.40.