Open htreu opened 4 years ago
@JasonFengJ9 can you pls take a look.
@htreu a standalone testcase will help the investigation.
Unfortunately this happens randomly, we are now "waiting" for it to happen again with the -Dcom.ibm.tools.attach.logging=yes
option applied.
One additional information: JVMs are running inside docker containers, PID is always 1.
We have one containerized JVM which shows the described behaviour. Following is the attach log:
1573202034475 1: 14 [Attach API initializer]: AttachHandler initialized
1573202034574 1: 14 [Attach API initializer]: IPC Directory=/tmp/.com_ibm_tools_attach
1573202034574 1: 14 [Attach API initializer]: createDirectoryAndSemaphore /tmp/.com_ibm_tools_attach
1573202034576 1: 14 [Attach API initializer]: non-blocking locking file /tmp/.com_ibm_tools_attach/_master
1573202034577 1: 14 [Attach API initializer]: deleteStaleDirectories checking _master
1573202034577 1: 14 [Attach API initializer]: AttachHandler obtained master lock
1573202034768 1: 14 [Attach API initializer]: locking file /tmp/.com_ibm_tools_attach/_attachlock
1573202034770 1: 14 [Attach API initializer]: createAdvertisementFile /tmp/.com_ibm_tools_attach/1/attachInfo
1573202034866 1: 14 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_attachlock
1573202034867 1: 14 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_master
1573202034966 1: 16 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entering
1573202034966 1: 16 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entered
1573202034967 1: 16 [Attach API wait loop]: iteration 0 waitForNotification starting wait
1573204837270 1: 16 [Attach API wait loop]: iteration 0 waitForNotification ended wait
1573204837367 1: 16 [Attach API wait loop]: connectToAttacher 0 waitForNotification no reply file
1573204837368 1: 16 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 0 waitForNotification obtainLock
1573204837368 1: 16 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204837466 1: 16 [Attach API wait loop]: iteration 0 checkReplyAndCreateAttachment releaseLock
1573204837567 1: 16 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204838676 1: 16 [Attach API wait loop]: iteration 1 waitForNotification ignoreNotification entering
1573204838677 1: 16 [Attach API wait loop]: iteration 1 waitForNotification ignoreNotification entered
1573204838766 1: 16 [Attach API wait loop]: iteration 1 waitForNotification starting wait
1573204838767 1: 16 [Attach API wait loop]: iteration 1 waitForNotification ended wait
1573204838769 1: 16 [Attach API wait loop]: connectToAttacher 1 waitForNotification no reply file
1573204838770 1: 16 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 1 waitForNotification obtainLock
1573204838770 1: 16 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204838771 1: 16 [Attach API wait loop]: iteration 1 checkReplyAndCreateAttachment releaseLock
1573204838967 1: 16 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204839969 1: 16 [Attach API wait loop]: iteration 2 waitForNotification ignoreNotification entering
1573204839969 1: 16 [Attach API wait loop]: iteration 2 waitForNotification ignoreNotification entered
1573204839970 1: 16 [Attach API wait loop]: iteration 2 waitForNotification starting wait
1573204839970 1: 16 [Attach API wait loop]: iteration 2 waitForNotification ended wait
1573204839971 1: 16 [Attach API wait loop]: connectToAttacher 2 waitForNotification no reply file
1573204839972 1: 16 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 2 waitForNotification obtainLock
1573204839972 1: 16 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
1573204839973 1: 16 [Attach API wait loop]: iteration 2 checkReplyAndCreateAttachment releaseLock
1573204839973 1: 16 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/1/attachNotificationSync
It goes on like this since the attach is retried every 10min.
We also looked into each file in /tmp/.com_ibm_tools_attach/1/
and additionally touched a file in this folder to see if writing the reply file would fail from a filesystem pov:
$ ls -l /tmp/.com_ibm_tools_attach/1
total 4
-rw-------. 1 generic root 311 Nov 8 08:33 attachInfo
-rw-rw-rw-. 1 generic root 0 Nov 8 08:33 attachNotificationSync
$
$ cd /tmp/.com_ibm_tools_attach/1
$
$ cat attachInfo
version=0.1
userId=generic
userUid=9999
processId=1
vmId=1
displayName=/deployments/apps-sync-user-account-autocomplete-service-2.0.5-SNAPSHOT.jar
notifier=_notifier
globalSemaphore=true
replyFile=/tmp/.com_ibm_tools_attach/1/replyInfo
attachNotificationSync=/tmp/.com_ibm_tools_attach/1/attachNotificationSync
$
$ cat attachNotificationSync
$
$ touch /tmp/.com_ibm_tools_attach/1/test
$
$ ls -l /tmp/.com_ibm_tools_attach/1
total 4
-rw-------. 1 generic root 311 Nov 8 08:33 attachInfo
-rw-rw-rw-. 1 generic root 0 Nov 8 08:33 attachNotificationSync
-rw-r--r--. 1 generic root 0 Nov 13 11:16 test
$
Is this a self attachment? if so, was -Djdk.attach.allowAttachSelf=true
passed to the command line?
@JasonFengJ9 no it is using the same java binary, but it is an attach from a separate process (e.g. docker exec -it invoked)
I will also add -Dcom.ibm.tools.attach.logging=yes
to the attacher, just saw in the j9 code it will also log its progress.
but it is an attach from a separate process (e.g. docker exec -it invoked)
Does this attacher process run as root and have PID 1
?
add -Dcom.ibm.tools.attach.logging=yes to the attacher
Please provide log for the attacher as well.
In addition, could you run jps
in that separate docker -exec it
and report its output? It should show the target process VMID (it appears 1
as https://github.com/eclipse/openj9/issues/7712#issuecomment-553374195), run jcmd 1 Thread.print
and please provide its output.
Java -version output
There are OpenJ9 1.8 JVMs with the same behaviour, so it seems not to be version specific.
Summary of problem
Attaching a javaagent to the running JVM via
com.sun.tools.attach.VirtualMachine
results inATTACH_ACK
response but the javaagent does not get loaded/started. A heap dump from the affected JVM does not reveal any of the classes provided by the javaagent's jar file. Restarting the JVM resolves the issue. From the outside everything looks as expected when attaching the javaagent.Diagnostic files
There is no output to the applications stdout/stderr whatsoever. We now added
-Dcom.ibm.tools.attach.logging=yes
and are waiting for the next occurrence. I will add the log when available.