eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.24k stars 714 forks source link

javaagent not started after attachment #7712

Open htreu opened 4 years ago

htreu commented 4 years ago

Java -version output

openjdk version "11.0.4" 2019-07-16 
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.4+11) 
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.15.1, JRE 11 Linux amd64-64-Bit Compressed References 20190717_286 (JIT enabled, AOT enabled) 
OpenJ9 - 0f66c6431 
OMR - ec782f26 
JCL - fa49279450 based on jdk-11.0.4+11)

There are OpenJ9 1.8 JVMs with the same behaviour, so it seems not to be version specific.

Summary of problem

Attaching a javaagent to the running JVM via com.sun.tools.attach.VirtualMachine results in ATTACH_ACK response but the javaagent does not get loaded/started. A heap dump from the affected JVM does not reveal any of the classes provided by the javaagent's jar file. Restarting the JVM resolves the issue. From the outside everything looks as expected when attaching the javaagent.

Diagnostic files

There is no output to the applications stdout/stderr whatsoever. We now added -Dcom.ibm.tools.attach.logging=yes and are waiting for the next occurrence. I will add the log when available.

pshipton commented 4 years ago

@JasonFengJ9 can you pls take a look.

JasonFengJ9 commented 4 years ago

@htreu a standalone testcase will help the investigation.

htreu commented 4 years ago

Unfortunately this happens randomly, we are now "waiting" for it to happen again with the -Dcom.ibm.tools.attach.logging=yes option applied. One additional information: JVMs are running inside docker containers, PID is always 1.

htreu commented 4 years ago

We have one containerized JVM which shows the described behaviour. Following is the attach log:

1573202034475 1: 14 [Attach API initializer]: AttachHandler initialized
1573202034574 1: 14 [Attach API initializer]: IPC Directory=/tmp/.com_ibm_tools_attach
1573202034574 1: 14 [Attach API initializer]: createDirectoryAndSemaphore /tmp/.com_ibm_tools_attach
1573202034576 1: 14 [Attach API initializer]: non-blocking locking file /tmp/.com_ibm_tools_attach/_master
1573202034577 1: 14 [Attach API initializer]: deleteStaleDirectories checking _master
1573202034577 1: 14 [Attach API initializer]: AttachHandler obtained master lock
1573202034768 1: 14 [Attach API initializer]: locking file /tmp/.com_ibm_tools_attach/_attachlock
1573202034770 1: 14 [Attach API initializer]: createAdvertisementFile /tmp/.com_ibm_tools_attach/1/attachInfo
1573202034866 1: 14 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_attachlock
1573202034867 1: 14 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_master
1573202034966 1: 16 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entering
1573202034966 1: 16 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entered
1573202034967 1: 16 [Attach API wait loop]: iteration 0 waitForNotification starting wait
1573204837270 1: 16 [Attach API wait loop]: iteration 0 waitForNotification ended wait
1573204837367 1: 16 [Attach API wait loop]: connectToAttacher 0 waitForNotification no reply file
1573204837368 1: 16 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 0 waitForNotification obtainLock
1573204837368 1: 16 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204837466 1: 16 [Attach API wait loop]: iteration 0 checkReplyAndCreateAttachment releaseLock
1573204837567 1: 16 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204838676 1: 16 [Attach API wait loop]: iteration 1 waitForNotification ignoreNotification entering
1573204838677 1: 16 [Attach API wait loop]: iteration 1 waitForNotification ignoreNotification entered
1573204838766 1: 16 [Attach API wait loop]: iteration 1 waitForNotification starting wait
1573204838767 1: 16 [Attach API wait loop]: iteration 1 waitForNotification ended wait
1573204838769 1: 16 [Attach API wait loop]: connectToAttacher 1 waitForNotification no reply file
1573204838770 1: 16 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 1 waitForNotification obtainLock
1573204838770 1: 16 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204838771 1: 16 [Attach API wait loop]: iteration 1 checkReplyAndCreateAttachment releaseLock
1573204838967 1: 16 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204839969 1: 16 [Attach API wait loop]: iteration 2 waitForNotification ignoreNotification entering
1573204839969 1: 16 [Attach API wait loop]: iteration 2 waitForNotification ignoreNotification entered
1573204839970 1: 16 [Attach API wait loop]: iteration 2 waitForNotification starting wait
1573204839970 1: 16 [Attach API wait loop]: iteration 2 waitForNotification ended wait
1573204839971 1: 16 [Attach API wait loop]: connectToAttacher 2 waitForNotification no reply file
1573204839972 1: 16 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 2 waitForNotification obtainLock
1573204839972 1: 16 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204839973 1: 16 [Attach API wait loop]: iteration 2 checkReplyAndCreateAttachment releaseLock
1573204839973 1: 16 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync

It goes on like this since the attach is retried every 10min.

We also looked into each file in /tmp/.com_ibm_tools_attach/1/ and additionally touched a file in this folder to see if writing the reply file would fail from a filesystem pov:

$ ls -l /tmp/.com_ibm_tools_attach/1
total 4
-rw-------. 1 generic root 311 Nov  8 08:33 attachInfo
-rw-rw-rw-. 1 generic root   0 Nov  8 08:33 attachNotificationSync
$

$ cd /tmp/.com_ibm_tools_attach/1
$

$ cat attachInfo
version=0.1
userId=generic
userUid=9999
processId=1
vmId=1
displayName=/deployments/apps-sync-user-account-autocomplete-service-2.0.5-SNAPSHOT.jar
notifier=_notifier
globalSemaphore=true
replyFile=/tmp/.com_ibm_tools_attach/1/replyInfo
attachNotificationSync=/tmp/.com_ibm_tools_attach/1/attachNotificationSync
$

$ cat attachNotificationSync
$

$ touch /tmp/.com_ibm_tools_attach/1/test
$

$ ls -l /tmp/.com_ibm_tools_attach/1
total 4
-rw-------. 1 generic root 311 Nov  8 08:33 attachInfo
-rw-rw-rw-. 1 generic root   0 Nov  8 08:33 attachNotificationSync
-rw-r--r--. 1 generic root   0 Nov 13 11:16 test
$
JasonFengJ9 commented 4 years ago

Is this a self attachment? if so, was -Djdk.attach.allowAttachSelf=true passed to the command line?

CodingFabian commented 4 years ago

@JasonFengJ9 no it is using the same java binary, but it is an attach from a separate process (e.g. docker exec -it invoked)

htreu commented 4 years ago

I will also add -Dcom.ibm.tools.attach.logging=yes to the attacher, just saw in the j9 code it will also log its progress.

JasonFengJ9 commented 4 years ago

but it is an attach from a separate process (e.g. docker exec -it invoked)

Does this attacher process run as root and have PID 1?

add -Dcom.ibm.tools.attach.logging=yes to the attacher

Please provide log for the attacher as well.

In addition, could you run jps in that separate docker -exec it and report its output? It should show the target process VMID (it appears 1 as https://github.com/eclipse/openj9/issues/7712#issuecomment-553374195), run jcmd 1 Thread.print and please provide its output.