Open ben-walsh opened 5 years ago
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/946 - showed a 17% failure rate.
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/947 - showed a 34% failure rate.
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/949 - showed a 8% failure rate.
@ben-walsh does RI
pass this test without failures?
@JasonFengJ9 This oldest available test run against Hotspot - https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk11_hs_openjdktest_x86-64_linux/150/testReport/java_net_Socket_asyncClose_AsyncClose/java/AsyncClose passes without such problems, as does each of the subsequent runs.
@JasonFengJ9 As that could be luck given the lowish failure rates observed against OpenJ9, to raise confidence, I have kicked off a 100x run against Hotspot latest nightly - https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/963
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/963 shows a 100/100 pass rate, so we can be confident that Hotspot does not have this problem. @JasonFengJ9
The problem can be reproduced locally with JIT on/off as follows:
../jdk_11.0.2_8_openj9/bin/java -Xint AsyncClose
******************************
Test: Socket.getInputStream().read()
Failed: Socket.getInputStream().read() wasn't preempted
******************************
Test: Socket.getInputStream().read() (with timeout)
Failed: Socket.getInputStream().read() wasn't preempted
******************************
Test: Socket.getOutputStream().write()
Passed.
******************************
Test: DatagramSocket.receive(DatagramPacket)
Passed.
******************************
Test: DatagramSocket.receive(DatagramPacket) (timeout specified)
Passed.
******************************
Test: ServerSocket.accept()
Passed.
******************************
Test: ServerSocket.accept() (with timeout)
Passed.
Exception in thread "main" java.lang.Exception: 2 sub-tests failed - see log.
at AsyncClose.main(AsyncClose.java:73)
Need to figure out how this test works and how it is related to VM.
Looking at the test code:
test/jdk/java/net/Socket/asyncClose/Socket_getInputStream_read.java
public void run() {
try {
InputStream in = s.getInputStream();
if (timeout > 0) {
s.setSoTimeout(timeout);
}
latch.countDown();
int n = in.read();
failed("Socket.getInputStream().read() returned unexpectly!!");
} catch (SocketException se) {
se.printStackTrace();
if (latch.getCount() != 1) {
closed(); <--------------------- set to true if SocketException is catpure when calling in.read()
}
}
public AsyncCloseTest go() {
try {
ServerSocket ss = new ServerSocket(0);
InetAddress lh = InetAddress.getLocalHost();
s.connect( new InetSocketAddress(lh, ss.getLocalPort()) );
Socket s2 = ss.accept();
Thread thr = new Thread(this);
thr.start();
latch.await();
Thread.sleep(5000); //sleep, so Socket.getInputStream().read() can block
s.close();
thr.join();
if (isClosed()) { <------------- passed if closed
return passed();
} else {
return failed("Socket.getInputStream().read() wasn't preempted");
}
The expected result is to throw out SocketException as Socket.getInputStream().read() gets blocked
NET_Read: ret = -1, errno = 9 (from Java_java_net_SocketInputStream_socketRead0 at /openj9-openjdk-jdk11/src/java.base/unix/native/libnet/SocketInputStream.c)
java.net.SocketException: Socket closed
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:172)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:208)
at Socket_getInputStream_read.run(Socket_getInputStream_read.java:62)
at java.base/java.lang.Thread.run(Thread.java:825)
******************************
Test: Socket.getInputStream().read()
Passed.
However, the test failed because Socket.getInputStream().read() still kept working till reached the end of inputstream during Thread.sleep(5000), which means sometimes Thread.sleep() didn't work correctly to block the read operation.
NET_Read: ret = 0, errno = 0
******************************
Test: Socket.getInputStream().read()
Failed: Socket.getInputStream().read() wasn't preempted
@babsingh
@ChengJin01
1) did you verify using a micro-test that Thread.sleep()
malfunctions?
2) did you verify that in.read()
blocks before s.close()
is invoked?
3) does s.close()
consistently generate a SocketException
for the blocked in.read()
?
4) closed()
is only invoked if latch.getCount()
returns 0. did you verify that the CountDownLatch
works properly: getCount()
, countDown()
and await()
?
@babsingh
The interesting thing I found is: [1] in the passed test, Thread.sleep() happened before NET_Read (in.read()) [2] in the failing test, NET_Read (in.read()) already finished before calling Thread.sleep(), which means there is no way to block it in such case.
in passed case, s.close() should be invoked first when in.read() is blocked
SocketException is generated by in.read() due to the closed socket
there is no problem with these methods as OpenJDK11/Hotspot works good (tried many times).
@ChengJin01
in the passed test, Thread.sleep() happened before NET_Read (in.read())
there is going to be a race condition between Thread.sleep() and in.read() since they are going to be invoked simultaneously. sleeping for 5 seconds should be sufficient in order to invoke in.read() before s.close().
in the failing test, NET_Read (in.read()) already finished before calling Thread.sleep()
in.read() should always block since no input data is available. So, in.read() shouldn't finish before Thread.sleep(). it should block until s.close() is invoked.
if in.read() finishes (doesn't block and returns) before calling thread.sleep(), then the issue is in the implementation of in.read().
More specific details are needed in order to pin-point the cause. Can you print timestamps around Thread.sleep() to verify its functionality? Also, print a timestamp before in.read() is invoked; this will tell us if in.read() is executed before s.close()?
import java.sql.Timestamp;
...
Timestamp timestamp = new Timestamp(System.currentTimeMillis());
System.out.println(timestamp);
Thread.sleep(5000); //sleep, so Socket.getInputStream().read() can block
timestamp = new Timestamp(System.currentTimeMillis());
System.out.println(timestamp);
s.close();
Documentation for InputStream.read: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html#read() This method blocks until input data is available.
Documentation for Socket.close: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/net/Socket.html#close() Any thread currently blocked in an I/O operation upon this socket will throw a SocketException.
Full test source code: https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/openj9/test/jdk/java/net/Socket/asyncClose/Socket_getInputStream_read.java
@babsingh ,
The race condition might exist but I doubt they should share the same lock with Thread.sleep() as follows:
/openj9-openjdk-jdk11/src/java.base/linux/native/libnet/linux_close.c
#define BLOCKING_IO_RETURN_INT(FD, FUNC) { \
int ret; \
threadEntry_t self; \
fdEntry_t *fdEntry = getFdEntry(FD); \
if (fdEntry == NULL) { \
errno = EBADF; \
return -1; \
} \
do { \
startOp(fdEntry, &self); \ <------
ret = FUNC; \
endOp(fdEntry, &self); \
} while (ret == -1 && errno == EINTR); \
return ret; \
}
int NET_Read(int s, void* buf, size_t len) {
BLOCKING_IO_RETURN_INT( s, recv(s, buf, len, 0) );
...
static inline void startOp(fdEntry_t *fdEntry, threadEntry_t *self)
{
self->thr = pthread_self();
self->intr = 0;
pthread_mutex_lock(&(fdEntry->lock)); <-------
{
self->next = fdEntry->threads;
fdEntry->threads = self;
}
pthread_mutex_unlock(&(fdEntry->lock));
In addition, I already confirmed in.read() was executed before Thread.sleep() and s.close() according the printing messages.
What is the test doing?
While reading a SocketInputStream
, in.read()
should block since no input data is available.
/* Thread 1 */
try {
private final Socket s = new Socket();
InputStream in = s.getInputStream();
int n = in.read();
} catch (SocketException se) {
/* PASS */
While Thread 1 is blocked by in.read()
, Thread 2 closes the socket s.close()
, which should generate SocketException
in Thread 1 in.read()
. The test passes when Thread 1 successfully catches SocketException
.
/* Thread 2 */
s.close();
Why is the test failing?
SocketInputStream
in.read()
doesn't block, and Thread 1 terminates before the socket is closed in Thread 2. When the socket is closed in Thread 2, SocketException
isn't generated in Thread 1 in.read()
, which causes the test to fail.
Where
SocketInputStream
in.read()
blocks in native code?
In /openj9-openjdk-jdk11/src/java.base/linux/native/libnet/linux_close.c
, while executing FUNC == recv
.
recv
man page: http://man7.org/linux/man-pages/man2/recv.2.html
#define BLOCKING_IO_RETURN_INT(FD, FUNC) { \
int ret; \
threadEntry_t self; \
fdEntry_t *fdEntry = getFdEntry(FD); \
if (fdEntry == NULL) { \
errno = EBADF; \
return -1; \
} \
do { \
startOp(fdEntry, &self); \ <------
ret = FUNC; \
endOp(fdEntry, &self); \
} while (ret == -1 && errno == EINTR); \
return ret; \
}
When
SocketInputStream
in.read()
fails to block?
SocketInputStream
in.read()
fails to block when the GC policy is set to gencon
(default on Linux). For the other GC policies (balanced
, metronome
, nogc
, optavgpause
and optthruput
), the test successfully passes.
Test output with debug code and -Xgcpolicy:gencon
:
Below, FUNC == recv
doesn't block and immediately returns in the failing case.
inputstream class: java.net.SocketInputStream
in.read()---------------------------
in.read start: 2019-02-14 14:34:06.91
BLOCKING_IO_RETURN_INT enter FUNC ret: 0, errno: 2
BLOCKING_IO_RETURN_INT exit FUNC ret: 0, errno: 2
NET_Read fd: 6, bufP: , len: 1, nread: 0, errno: 2
SocketInputStream.read([b,I,I,I) socketRead: fd = java.io.FileDescriptor@b416428 n = 0
n = -1 <--- recv didn't block; in.read returned -1.
finally block - no SocketException thrown; count = 0
thread sleep start: 2019-02-14 14:34:06.91
thread sleep end: 2019-02-14 14:34:11.928 <--- Thread slept for 5 seconds
waking up---------------------------
s.close() ------------ done -----------
thr.join() ------------ done -----------
before pass/fail latch count = 0
******************************
Test: Socket.getInputStream().read()
Failed: Socket.getInputStream().read() wasn't preempted
Exception in thread "main" java.lang.Exception: 1 sub-tests failed - see log.
at AsyncClose.main(AsyncClose.java:73)
Test output with debug code and -Xgcpolicy:metronome|balanced|nogc|optavgpause|optthruput
:
Below, FUNC == recv
blocks until s.close()
is invoked. After s.close()
finishes, FUNC
resumes code execution, and in.read() / NET_Read
catches SocketException
.
inputstream class: java.net.SocketInputStream
in.read()---------------------------
sleep 5000 begin---------------------------
in.read start: 2019-02-14 14:37:49.308
thread sleep start: 2019-02-14 14:37:49.308
BLOCKING_IO_RETURN_INT enter FUNC ret: 0, errno: 2
thread sleep end: 2019-02-14 14:37:54.323 <--- Thread slept for 5 seconds
waking up---------------------------
s.close() ------------ done -----------
BLOCKING_IO_RETURN_INT exit FUNC ret: -1, errno: 9
NET_Read fd: 6, bufP: , len: 1, nread: -1, errno: 9
java.net.SocketException: Socket closed
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:169)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:203)
at Socket_getInputStream_read.run(Socket_getInputStream_read.java:67)
at java.base/java.lang.Thread.run(Thread.java:825)
setting closed(): count = 0
finally, no SocketException thrown; count = 0
thr.join() ------------ done -----------
before pass/fail latch count = 0
******************************
Test: Socket.getInputStream().read()
Passed.
errno
is set to 2 (ENOENT
No such file or directory) while executingSocketInputStread
in.read()
.
Tracked errno
in gdb: watch *__errno_location()
. errno
changed at two places in GC code:
errno
changes to 2
Thread 2 "java" hit Watchpoint 3: *__errno_location()
Old value = 0 New value = 2 0x00007fffef0584db in MM_CopyScanCacheChunk::initialize(MM_EnvironmentBase, unsigned long, MM_CopyScanCacheChunk, unsigned long, MM_CopyScanCacheStandard**) () from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so (gdb) where
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
2. Location 2 where `errno` changes to 2
Thread 2 "java" hit Watchpoint 4: *__errno_location()
Old value = 0 New value = 2 0x00007ffff78fd6c0 in __mmap (addr=addr@entry=0x0, len=len@entry=1622016, prot=prot@entry=3, flags=flags@entry=34, fd=fd@entry=-1, offset=offset@entry=0) at ../sysdeps/unix/sysv/linux/wordsize-64/mmap.c:34 34 ../sysdeps/unix/sysv/linux/wordsize-64/mmap.c: No such file or directory. (gdb) where
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
from /root/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9gc29.so
The test passes with OpenJ9 JDK8, and `errno` is always set to 0 in OpenJ9 JDK8.
The test fails with OpenJ9 JDK11, and `errno` changes to 2 while running GC code. The failure is only seen with the `gencon` GC policy.
OpenJ9 doesn't implement Java/native code for `java.net.Socket`, `SocketInputStream` and its dependencies. `openj9-openjdk-jdk11` has the Java/native implementation for `java.net.*`.
fyi - @ChengJin01. Adding GC team members to investigate further: @dmitripivkine @amicic.
I don't see how this relates to GC code. Requested memory size is reasonable. One of this points is in Port Library. Another one relates to initialization of allocated memory and again size is correct so there is no out-of-bound writes
A similar bug has been seen on macosx on jdk8.
https://github.com/eclipse/openj9/issues/5139
Expanding the excludes in jdk8 and 11 to include all platforms.
Intermittent failure(s) ...
... of test https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/openj9/test/jdk/java/net/Socket/asyncClose/AsyncClose.java observed on both Linux and Windows ...
https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk11_j9_openjdktest_x86-64_linux/266/testReport/junit/java_net_Socket_asyncClose_AsyncClose/java/AsyncClose https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk11_j9_openjdktest_x86-64_windows/122/testReport/junit/java_net_Socket_asyncClose_AsyncClose/java/AsyncClose
I have kicked off a 100x (https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/946) and 100x with -Xint (https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/947) on Linux.
I have also kicked off a 100x with -Xint (https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/949) on Windows.