Closed takiguc closed 5 years ago
Will investigate to see what happens.
I already reproduced the issue with OpenJDK11/OpenJ9 on LinuxLite (downloaded from https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.1%2B13/OpenJDK11U-jdk_x64_linux_openj9_jdk-11.0.1_13_openj9-0.11.0_11.0.1_13.tar.gz) jdk-11.0.1_13/bin/java -jar jdk-11.0.1_13/demo/jfc/SwingSet2/SwingSet2.jar
and It works good with OpenJDK11/Hotspot on LinuxLite (downloaded from https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.1%2B13/OpenJDK11U-jdk_x64_linux_hotspot_11.0.1_13.tar.gz) jdk-11.0.1_13_hotspot/bin/java -jar jdk-11.0.1_13/demo/jfc/SwingSet2/SwingSet2.jar
So it looks like there is something wrong happening in our code which needs further analysis.
Does it still occur with -Xint?
Yes, it ended up with pretty much the same result with JIT off. jdk-11.0.1_13/bin/java -Xint -jar jdk-11.0.1_13/demo/jfc/SwingSet2/SwingSet2.jar
I tried to isolate the problem by writing two test cases to with different APIs to display the GIF.
[1] Test1:
File picture = new File("resources/images/misc/toast.gif");
Image image=ImageIO.read(picture); <---------------
ImageIcon icon = new ImageIcon(image);
The picture shows up correctly when loading with ImageIO.read()
[2] Test2:
ImageIcon icon = new ImageIcon("resources/images/misc/toast.gif");
The test ends up with an interlaced picture.
Considering that the loading mechanism ImageIcon(fileName) (loading image data on the background) is different from ImageIO.read() (get all data loaded with one thread), need to investigate what happened to ImageIcon(fileName) to figure out whether the image data got messed up during loading.
Investigation shows the interlaceflag in Test2 (the original code in SwingSet2) was set to false during reading the image via readImage() at src/java.desktop/share/classes/sun/awt/image/ImageDecoder.java (OpenJDK11)
private native boolean parseImage(int x, int y, int width, int height,
boolean interlace, int initCodeSize,
byte block[], byte rasline[],
IndexColorModel model);
/**
* Read Image data
*/
private boolean readImage(boolean first, int disposal_method, int delay)
throws IOException
{
boolean interlace = (block[8] & INTERLACEMASK) != 0;
System.out.println("GifImageDecoder.readImage: interlace = " + interlace);
...
boolean ret = parseImage(x, y, width, height, interlace, initCodeSize, block, rasline, model);
...
The debugging messages are as follows:
java.lang.Exception
at java.desktop/sun.awt.image.GifImageDecoder.readImage(GifImageDecoder.java:503)
at java.desktop/sun.awt.image.GifImageDecoder.produceImage(GifImageDecoder.java:213)
at java.desktop/sun.awt.image.InputStreamImageSource.doFetch(InputStreamImageSource.java:269)
at java.desktop/sun.awt.image.ImageFetcher.fetchloop(ImageFetcher.java:212)
at java.desktop/sun.awt.image.ImageFetcher.run(ImageFetcher.java:176)
Reading a 310 by 192 non-interlaced image...
GifImageDecoder.readImage: interlace = false
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at sun/awt/image/GifImageDecoder.parseImage(Native Method)
4XESTACKTRACE at sun/awt/image/GifImageDecoder.readImage(GifImageDecoder.java:600)
4XESTACKTRACE at sun/awt/image/GifImageDecoder.produceImage(GifImageDecoder.java:213)
4XESTACKTRACE at sun/awt/image/InputStreamImageSource.doFetch(InputStreamImageSource.java:269)
4XESTACKTRACE at sun/awt/image/ImageFetcher.fetchloop(ImageFetcher.java:212)
4XESTACKTRACE at sun/awt/image/ImageFetcher.run(ImageFetcher.java:176)
However, the value of the passed-in interlace flag was messed up when calling the native code GifImageDecoder.parseImage() at src/java.desktop/share/native/libawt/awt/image/gif/gifdecoder.c
JNIEXPORT jboolean JNICALL
Java_sun_awt_image_GifImageDecoder_parseImage(JNIEnv *env,
jobject this,
jint relx, jint rely,
jint width, jint height,
jint interlace, <-------------- the passed-in interlace flag
jint initCodeSize,
jbyteArray blockh,
jbyteArray raslineh,
jobject cmh)
...
int passinc = interlace ? 8 : 1;
int passht = passinc;
int len;
printf("\nJava_sun_awt_image_GifImageDecoder_parseImage: interlace = %d\n", interlace);
printf("\nJava_sun_awt_image_GifImageDecoder_parseImage: passinc = %d\n", passinc);
...
Java_sun_awt_image_GifImageDecoder_parseImage: interlace = 1476422656 <---------------------
Java_sun_awt_image_GifImageDecoder_parseImage: passinc = 8
So I tried to hack the native code Java_sun_awt_image_GifImageDecoder_parseImage() by forcing passinc to 1 (assuming interlace is false) and picture was correctly generated as follows:
Based on the investigation above, It seems there is some issue with manipulation on boolean value in JNI & bytecode interpreter in our code which needs to be fixed (Specifically the problem happens when converting from boolean (Java level) to jint (native) via JNI)
FYI: @DanHeidinga, @tajila
Just talked to Tobi, it seems there was no change with the passed-in arguments that got merged.
Given that the problem happened in the JNI native, need to double-check the calling path from java to native (runJNINative, callCFunction, e.g) to figure out how the passe-in arguments were handled.
The root cause of the issue is due to the inconsistency of the argument type between the declaration in java code and the corresponding implementation in the native code in OpenJDK11 as follows:
src/java.desktop/share/classes/sun/awt/image/GifImageDecoder.java
private native boolean parseImage(
int x, int y,
int width, int height,
boolean interlace, <-------- the boolean type in java code
int initCodeSize,
byte block[], byte rasline[],
IndexColorModel model);
src/java.desktop/share/native/libawt/awt/image/gif/gifdecoder.c
JNIEXPORT jboolean JNICALL
Java_sun_awt_image_GifImageDecoder_parseImage(JNIEnv *env, jobject this,
jint relx, jint rely,
jint width, jint height,
jint interlace, <--- wrongly cast to the int type in the native code
jint initCodeSize,
jbyteArray blockh,
jbyteArray raslineh,
jobject cmh)
{
Technically, this issue has nothing to do with VM (from cJNICallout) or FFI in JNI operation as the FFI code depends on the size of type in the declaration to copy the value from the java stack to the native.
Thus, changing "jint interlace" to "jboolean interlace" fixes the issue and the GIF image will be generated correctly as as result.
@DanHeidinga , I wondering whether we can request Oracle/OpenJDK to update the incorrect argument type for Java_sun_awt_image_GifImageDecoder_parseImage() in Java11 as this makes more sense from JNI perspective.
@andrew-m-leonard can you propose a patch at OpenJDK addressing the mismatch between the java & native signatures? Bonus points if it can be backported to JDK11
@ChengJin01 , could you explain why Linux ppc64le, s390x and AIX build don't have this issue ? I assume the situation is same as Linux amd64 build.
@takiguc, this entirely depends on how the target platform extends or casts from boolean (1byte) to int (e.g. 4bytes) on the native stack (this also depends on whether it is big-endian, in which case we will help to extend in VM). Technically the argument types should be identical on both java and native from JNI perspective. Relying on the automatic cast/conversion on platforms is dangerous in such case.
I am still double-checking the related VM & FFI code on X86 (UNIX-based) to see whether we can figure out a way around the issue against the VM Spec as to the manipulation on the boolean type.
Created openjdk bug : https://bugs.openjdk.java.net/browse/JDK-8217735
Further investigation on the FFI code (X86) shows the problem was caused by the aligned memory without initialization before copying arguments the native stack as follows:
/runtime/libffi/x86/ffi64.c
void
ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
{
...
stack = alloca (sizeof (struct register_args) + cif->bytes + 4*8);
reg_args = (struct register_args *) stack;
argp = stack + sizeof (struct register_args); <------ argp points to the allocate native stack for arguments.
...
for (i = 0; i < avn; ++i)
{
...
n = examine_argument (arg_types[i], classes, 0, &ngpr, &nsse);
if (n == 0
|| gprcount + ngpr > MAX_GPR_REGS
|| ssecount + nsse > MAX_SSE_REGS)
{
long align = arg_types[i]->alignment;
/* Stack arguments are *always* at least 8 byte aligned. */
if (align < 8)
align = 8;
/* Pass this argument in memory. */
argp = (void *) ALIGN (argp, align); <---------- mostly aligned by 8 bytes
memcpy (argp, avalue[i], size);
argp += size; <---- the next address still moves forward by 8bytes even though it skips the current type size.
}
else
{
/* The argument is passed entirely in registers. */
... ...
So it means all arguments on the stack are aligned at 8 bytes (already verify the address of argp)
e.g. if the current stack pointer argp stores the boolean value (1 byte) at 0x28117640, the next value to be copied starts at 0x28117648.
0x28117640 | argument : boolean ..... | <--- 8 bytes (1 byte value + 7 bytes garbage data after memory allocation.
0x28117648 | argument: int ..... | <---- 8 bytes (4 bytes value + 4 bytes garbage data)
...
In such case, casting the boolean type (1 byte ) to the int type (4 bytes) inevitably introduces the garbage data (random value) existing in the allocated memory.
Technically there are two solutions to the issue on our side. [1] initialize the stack memory before copying the arguments
/runtime/libffi/x86/ffi64.c
void
ffi_call (ffi_cif *cif, void (*fn)(void), void *rvalue, void **avalue)
{
size_t stack_size;
...
stack_size = sizeof (struct register_args) + cif->bytes + 4*8;
stack = alloca(stack_size);
memset(stack, 0, stack_size); <------- initialized with 0 for the allocated memory
argp = stack + sizeof (struct register_args);
The idea is to ensure there is no garbage data introduced in casting short-size types to long-size types (e.g. from boolean to int)
[2] replace the FFI type of boolean (1 byte) to with the FFI type of int (4 bytes)
The idea is based on the VM Spec which says:
2.3.4 The boolean Type
Although the Java Virtual Machine defines a boolean type,
it only provides very limited support for it.
There are no Java Virtual Machine instructions solely dedicated
to operations on boolean values. Instead, expressions in the Java
programming language that operate on boolean values are
compiled to use values of the Java Virtual Machine int data type.
From VM Spec perspective, the int type for boolean in the native is acceptable at this point. If so, we can change the FFI type of boolean as follows to ensure it can be safely & correctly casted to both jint (4types) and jboolean (1byte), whatever the target platform is.
/runtime/vm/BytecodeInterpreter.hpp
static const VMINLINE ffi_type *
getFFIType(U_8 j9ntc) {
static const ffi_type * const J9NtcToFFI[] = {
&ffi_type_void, /* J9NtcVoid */
&ffi_type_sint32, /* J9NtcBoolean */ <------- replace &ffi_type_uint8 with &ffi_type_sint32
....
&ffi_type_sint32, /* J9NtcInt */
};
In such case, the boolean type always takes 4 bytes on the java stack for arguments (actually the initialization has been done by its own assignment rather than in FFI code)
The only difference is that solution [1] is specific to X86 on which the issue was originally detected and has no impact on other platform, while solution [2] affects all platforms (theoretically solution [2] should work on all platforms)
So we can choose either of these two solutions above to fix the issue on our side if Oracle insists on the jint type.
@DanHeidinga, any preference on the solutions above?
@ChengJin01 there's been recent changes to return bytecodes, putfield
, Unsafe, and JNI returns to ensure that booleans are only returned as 1 or 0. Can you write some tests to validate whether booleans should also be converted to 0 / 1 when calling jni?
I've created a simple test to verify the assumption above to see how it goes with a passed-in int value in the case of the boolean argument in the native (the .class file was generated via ASM as javac refuses to convert int to boolean in the code level) jnibooltest.zip snippet of JniBoolArgTest.class
public static native void printBoolMethod(boolean);
descriptor: (Z)V
flags: (0x0109) ACC_PUBLIC, ACC_STATIC, ACC_NATIVE
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=2, args_size=1
0: bipush 126 <---------- 0x7e
2: istore_1
3: iload_1 <---------- load 0x7e for the JNI native printBoolMethod()
4: invokestatic #15 // Method printBoolMethod:(Z)V
7: return
JniBoolArgTest.c
JNIEXPORT void JNICALL Java_JniBoolArgTest_printBoolMethod(JNIEnv * env, jclass clazz, jboolean arg)
{
printf("\nJava_JniBoolArgTest_printBoolMethod: arg = 0x%x\n", arg);
return;
}
[1] result on OpenJ9
root@JCHTORLNXVM041:~/jchau/temp/#jdk11_openj9/bin/java -Djava.library.path=".../jnitest" JniBoolArgTest
Java_JniBoolArgTest_printBoolMethod: arg = 0x7e
[2] result on Hotspot
root@JCHTORLNXVM041:~/jchau/temp/#jdk_11.0.1_13_hotspot/bin/java -Djava.library.path=".../jnitest" JniBoolArgTest
Java_JniBoolArgTest_printBoolMethod: arg = 0x7e
So Both OpenJ9 and Hotspot ended up with the same incorrect result as the conversion won't happen automatically in the case of boolean.
The conversion can be done on the java stack before copying the arguments to the native e.g.
VMINLINE FFI_Return
cJNICallout(REGISTER_ARGS_LIST, ...)
{
...
for (U_8 i = 0; i < argCount; i++) {
...
if (J9NtcBoolean == argTypes[i]) {
javaArgs[offset] = javaArgs[offset] & 1;
}
The only concern is the fix changes the original passed-in value on the java stack; otherwise we have to address the conversion after the copy in the FFI code specific to all platforms, which seems complicated.
indicates that JNI code processing booleans should only ever read a char's worth of data so uint8 is correct.
Given this bug though, I'm leaning towards the solution in https://github.com/eclipse/openj9/issues/4193#issuecomment-457356547 but using ffi_type_uint32
rather than a signed one.
@ChengJin01 Can you confirm that passes the test?
@DanHeidinga , Changing to ffi_type_uint32 for boolean only fixes the case of int type in the native (the issue with GIF image) but can't address the case of a random int value (1 byte) to be passed as boolean to the native. The reason is that it assumes the original value is 1 type (the same as solution [1]) , in which case there is no conversion from 1 byte int value (e.g. 0x7e) to 0 or 1.
Already confirmed ffi_type_uint32 fixes the issue with GIF images ../jdk/bin/java -jar SwingSet2.jar
But it still failed at the JniBoolArgTest as follows:
root@JCHTORLNXVM041:~/jchau/temp# ../jdk/bin/java -Djava.library.path=".../jnitest" JniBoolArgTest
Java_JniBoolArgTest_printBoolMethod: arg = 0x7e
The results above are correct as the issue with GIF image (whether a boolean value can be safely cast to 1 or 4 bytes without considering whether the value is valid) and the boolean conversion in JniBoolArgTest (whether the passed-in 1 byte value is 0 or 1 for boolean) belong to two different situations in the native which need to be addressed separately.
Some JNI native tests (detected in personal build) with boolean return type shows changing to ffi_type_uint32 affects the return value for boolean. Need to check what happened in such case.
The failing JNI native test can be simplified as follows: JniBoolArgTest.java
public class JniBoolArgTest {
public static native boolean printBoolMethod1(String arg);
public static native boolean printBoolMethod2(String arg);
static {
System.loadLibrary("jniboolarg");
}
public static void main(String[] args) {
boolean result = printBoolMethod1("ABCDEFG");
System.out.println("result = " + result); <---- returned false
boolean result = printBoolMethod2("ABCDEFG");
System.out.println("result = " + result); <----- unexpected returned true
}
JniBoolArgTest.c
jobject str;
JNIEXPORT jboolean JNICALL Java_JniBoolArgTest_printBoolMethod1(JNIEnv * env, jclass clazz, jstring arg)
{
str = JNI_ENV_PTR(env)->NewGlobalRef(JNI_ENV_ARG(env,arg)); <---- str is non-NULL
jboolean result = (str == NULL);
return result; <------------- returned false;
}
JNIEXPORT jboolean JNICALL Java_JniBoolArgTest_printBoolMethod2(JNIEnv * env, jclass clazz, jstring arg)
{
str = JNI_ENV_PTR(env)->NewGlobalRef(JNI_ENV_ARG(env,arg)); <---- str is non-NULL
return (str == NULL); <------ was supposed to return false but detected true in the java code
}
The failing native test in printBoolMethod2() should ended up with the same result (false) as printBoolMethod1() but unexpected return true.
Looking at the corresponding assembly code generated by objdump as follows:
<Java_JniBoolArgTest_printBoolMethod1>:
685: push %rbp
686: mov %rsp,%rbp
...
6c8: mov (%rax),%rax
6cb: test %rax,%rax
6ce: sete %al <------------ set the result (false) to the lowest byte of EAX
6d1: mov %al,-0x1(%rbp)
6d4: movzbl -0x1(%rbp),%eax <---- zero-extended the result to EAX (so EAX = 0x00 00 00 00)
6d8: leaveq
6d9: retq
<Java_JniBoolArgTest_printBoolMethod2>:
685: push %rbp
686: mov %rsp,%rbp
...
...
6c8: mov (%rax),%rax
6cb: test %rax,%rax
6ce: sete %al <---- only set the result (false) to the lowest byte of EAX and returned after that
so EAX = 0xXX XX XX 00, X means random value)
6d1: leaveq
6d2: retq
According to the assembly code above, the high 3 bytes of EAX didn't get initialized on return and came with random values when changing the boolean type to uint32 (4 bytes), which led to incorrect return value in the case of boolean.
To fix the issue, convertJNIReturnValue() in /runtime/oti/VMHelpers.hpp needs to be updated to only check the lowest one byte on the return value as follows:
case J9NtcBoolean:
*returnStorage = ((UDATA)(U_32)*returnStorage) & 1;
Will check to see whether there is other issue in the personal build with all changes on Java 8 & 11.
@ChengJin01 As part of https://github.com/eclipse/openj9/issues/2049 , it was determined that boolean returns from JNI should check for != 0
to determine true or false to match the C conventions.
This also matches the RI's behaviour.
@DanHeidinga , != 0
is correct only when there is only 1 byte for boolean on return.
To match the RI's behaviour, we can change the code to
case J9NtcBoolean:
*returnStorage = ((UDATA)(U_32)(0 != (*returnStorage & 0xFF)));
given that the actual return value is stored in the lowest byte after extending to 4 bytes.
In addition, Initializing returnStorage with 0 before calling native doesn't work (already verified) because it is always overwritten by EAX on return in the case of uint32.
If the code in inconvertJNIReturnValue() remains unchanged to match the RI's behaviour, then the only way around is to ensure it returns the correct value at the assembly level in FFI code. e.g. on X86 change the code for uint32 in /runtime/libffi/x86/unix64.S
.Lst_uint32:
movl %eax, %eax
movq %rax, (%rdi)
to
.Lst_uint32:
movzbq %al, %rax <---------------- zero-extended the result to rax
movq %rax, (%rdi)
ret
Under such circumstances, we have to address the similar assembly code on all platforms to accommodate the requirement. In addition, uint32 must be only specific to the boolean type.
@ChengJin01 I like your solution. Is it more clear to read that as a U_8*
rather than & 0xFF
?
ie: (UDATA)(0 != (*(U_8 *)returnAddress));
@DanHeidinga , I already verified/updated to (UDATA)(0 != *(U_8*)returnStorage)
as it makes more sense to us.
Just detected a bunch of failing test cases (without direct connection with my changes) on Linux PPC/390 and zOS in the internal personal build (Java 8) with the latest fix. Need to narrow down the code to figure out which part of the fix caused these issues.
The problem was caused by (UDATA)(0 != *(U_8*)returnStorage)
as it fetches the lowest byte (the 1st byte) of return data, which only applies to the little-endian based systems (x86). However, the return value on big-endian based systems (Linux PPC/AIX/zOS390) is stored on the highest byte (the 4th byte) rather than the lowest byte.
Byte 1 2 3 4
little-endian 0x01 0xAA 0xBB 0xCC
big-endian 0xCC 0xBB 0xAA 0x01
To get the correct return value, the code should change to fetch the 1st byte for the little-endian based systems and the 4th byte for the big-endian based systems as follows:
case J9NtcBoolean:
{
U_32 returnValue = (U_32)*returnStorage;
U_8 * returnAddress = (U_8 *)&returnValue;
#ifdef J9VM_ENV_LITTLE_ENDIAN
*returnStorage = (UDATA)(0 != returnAddress[0]);
#else
*returnStorage = (UDATA)(0 != returnAddress[3]);
#endif /*J9VM_ENV_LITTLE_ENDIAN */
}
break;
The failing test cases passed with the fix above. Need to launch personal builds to double-check.
Created openjdk bug : https://bugs.openjdk.java.net/browse/JDK-8217735
@andrew-m-leonard Can you put the fix for this into the Extensions repos? I'd like to ensure we have the right behaviour while we finish reviewing this PR
Tagged against the 0.13 milestone as a reminder to get the extensions change into this release.
I've raised the question on the awt-dev maillist to get Oracle's position... https://mail.openjdk.java.net/pipermail/awt-dev/2019-March/015029.html
A fix to the JNI declaration has been merged to jdk12. https://github.com/ibmruntimes/openj9-openjdk-jdk12/pull/20 https://github.com/ibmruntimes/openj9-openjdk-jdk12/pull/22
Merged into openjdk http://hg.openjdk.java.net/jdk/client/rev/62171da145f9 Should get merged into jdk"head" from client within a week.
Hello.
I used OpenJDK JDK11 + OpenJ9 for Linux amd64. When I executed SwingSet2 demo, GIF images were not displayed properly. And the situation was random, 4 GIF images were displayed on JInternalFrame Demo. 2 or 3 images were broken.
Version Information: RHEL 7.5
Test instruction:
This issue only happens on Linux amd64. Linux ppc64le and s390x builds were OK. It worked fine on following AdoptOpenJDK build.
It seemed GIF image treated as interlace GIF. But I checked interlace flag on GIF.
I checked the latest IBM Java8 for Linux amd64, but I could not recreate this issue. I assume IBM Java8 applied a fix against class library on GIF image decoder related code. But it seems register or memory area was not clean up properly.