eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

Attach API does not work as expected in docker container #1704

Open kubaseai opened 6 years ago

kubaseai commented 6 years ago

root@user-Aspire-ES1-431:~# docker run -d -p 8080:8080 -e DOCKER_HOST_IP=192.168.100.132 -e IBM_JAVA_OPTIONS="-Dcom.ibm.tools.attach.logging=yes" --ipc=host --net=host -v /tmp:/tmp kubaseai/bw-time

I'm able to connect with jconsole using Attach API (/tmp directory + semaphore + TCP/IP).

When I remove --net=host JVM inside has got different meaning of localhost than jconsole. I guess that something like this should help:

diff --git a/jdk/src/jdk.management.agent/share/classes/sun/management/jmxremote/ConnectorBootstrap.java b/jdk/src/jdk.management.agent/share/classes/sun/management/jmxremote/ConnectorBootstrap.java
index d161593401..cc83a2b8c4 100644
--- a/jdk/src/jdk.management.agent/share/classes/sun/management/jmxremote/ConnectorBootstrap.java
+++ b/jdk/src/jdk.management.agent/share/classes/sun/management/jmxremote/ConnectorBootstrap.java
@@ -535,7 +535,12 @@ public final class ConnectorBootstrap {

         MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
         try {
-            JMXServiceURL url = new JMXServiceURL("rmi", localhost, 0);
+           int port = 0;
+           try {
+               port = Integer.valueOf(System.getProperty("java.rmi.server.port"));
+           }
+           catch (Exception exc) {}
+            JMXServiceURL url = new JMXServiceURL("rmi", localhost, port);
             // Do we accept connections from local interfaces only?
             Properties props = Agent.getManagementProperties();
             if (props ==  null) {

diff --git a/jcl/src/java.base/share/classes/com/ibm/tools/attach/target/Attachment.java b/jcl/src/java.base/share/classes/com/ibm/tools/attach
/target/Attachment.java
index 82325437..1c494176 100644
--- a/jcl/src/java.base/share/classes/com/ibm/tools/attach/target/Attachment.java
+++ b/jcl/src/java.base/share/classes/com/ibm/tools/attach/target/Attachment.java
@@ -61,6 +61,7 @@ final class Attachment extends Thread implements Response {
        private final String key;
        private static final String START_REMOTE_MANAGEMENT_AGENT = "startRemoteManagementAgent"; //$NON-NLS-1$
        private static final String START_LOCAL_MANAGEMENT_AGENT = "startLocalManagementAgent"; //$NON-NLS-1$
+       private static final String HOSTNAME_OVERRIDE_PROPERTY = "com.ibm.tools.attach.target.hostname"; //$NON-NLS-1$

        private static final class MethodRefsHolder {
                static Method startLocalManagementAgentMethod = null;
@@ -109,6 +110,19 @@ final class Attachment extends Thread implements Response {
                setDaemon(true);
        }

+       static String getOverridenLocalHostName() {
+               /** In case we are crossing (Docker) container boundaries
+                 * we have two different localhosts - the one we
+                 * are attaching from into container is 'external' localhost.
+                 * Let's allow for accessing it with overriden hostname */
+
+               /** from JavaDoc InetAddress.getByName: If the host is null then an InetAddress
+                 * representing an address of the loopback interface is returned */
+
+               return System.getProperty(HOSTNAME_OVERRIDE_PROPERTY);
+       }
+       
+           
        /**
         * Create an attachment with a socket connection to the attacher
         * 
@@ -118,7 +132,7 @@ final class Attachment extends Thread implements Response {
         */
        boolean connectToAttacher(int portNum) {
                try {
-                       InetAddress localHost = InetAddress.getLoopbackAddress();
+                       InetAddress localHost = InetAddress.getByName(getOverridenLocalHostName());
                        attacherSocket = new Socket(localHost, portNum);
                        IPC.logMessage("connectToAttacher localPort=",  attacherSocket.getLocalPort(), " remotePort=", Integer.toString(attacherSocket.getPort())); //$NON-NLS-1$//$NON-NLS-2$
                        responseStream = attacherSocket.getOutputStream();

root@user-Aspire-ES1-431:~# docker run -d -p 8080:8080 -p 10200:10200 -e DOCKER_HOST_IP=192.168.100.132 -e IBM_JAVA_OPTIONS="-Dcom.ibm.tools.attach.logging=yes -Djava.rmi.server.port=10200 -Dcom.sun.management.jmxremote.local.only=false" --ipc=host -v /tmp:/tmp -v /root/tmp/openj9-openjdk-jdk9/build/linux-x86_64-normal-server-release/images/jdk:/opt/java/openjdk/jdk-9 kubaseai/bw-time 3a6e21ca59194231cc5e31d2b291e6f6b03ce3e6b950a12d65c83394d4bd029e

root@user-Aspire-ES1-431:~# docker logs 3a6
Starting Tibco EAI stack with projects/BW-HTTP-Time
JAVA_HOME is /opt/java/openjdk/jdk-9
Using property file /opt/tibco/project.tra
Using work space directory /opt/tibco/bw/5.13/bin/working/3a6e21ca5919
Creating trace file /opt/tibco/bw/5.13/bin/logs/3a6e21ca5919.log
Using XMLReader org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser
2018 Apr 18 18:29:46:038 GMT +0000 BW.3a6e21ca5919 Info [BW-Core] BWENGINE-300001 Process Engine version 5.13.0, build V24, 2015-8-11 
2018 Apr 18 18:29:46:057 GMT +0000 BW.3a6e21ca5919 Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.13.0, build V24, 2015-8-11 
2018 Apr 18 18:29:46:067 GMT +0000 BW.3a6e21ca5919 Info [BW-Core] BWENGINE-300010 XML Support: TIBCOXML Version 5.60.0.003 
2018 Apr 18 18:29:46:068 GMT +0000 BW.3a6e21ca5919 Info [BW-Core] BWENGINE-300011 Java version: Eclipse OpenJ9 VM master-4c8cdb8 
2018 Apr 18 18:29:46:069 GMT +0000 BW.3a6e21ca5919 Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 4.15.0-13-generic 
2018 Apr 18 18:29:48:550 GMT +0000 BW.3a6e21ca5919 Info [BW-Core] BWENGINE-300013 Tibrv string encoding: UTF-8 
2018 Apr 18 18:29:48:834 GMT +0000 BW.3a6e21ca5919 Warn [BW_Core]  Duplicate message map entry for BW-HTTP-100118 
2018 Apr 18 18:29:48:834 GMT +0000 BW.3a6e21ca5919 Warn [BW_Core]  Duplicate message map entry for BW-HTTP-100206 
2018 Apr 18 18:29:49:914 GMT +0000 BW.3a6e21ca5919 Info [BW_Plugin] BW-HTTP-100500 Using the following HTTP minProcessors/maxProcessors [host=localhost  port=8080]: 10/75 
creating file: /opt/tibco/bw/5.13/bin/working/3a6e21ca5919/internal/nextJobidBlock
2018 Apr 18 18:29:50:784 GMT +0000 BW.3a6e21ca5919 Info [BW-Core] BWENGINE-300002 Engine 3a6e21ca5919 started 
Using JMX MBean name [com.tibco.bw:key=engine,name="3a6e21ca5919"]
Create web.xml file structure in: /opt/tibco/bw/5.13/bin/working/3a6e21ca5919
Creating Host for: /opt/tibco/bw/5.13/bin/working/3a6e21ca5919/tomcat/webapps
root@user-Aspire-ES1-431:~# 

root@user-Aspire-ES1-431:~# docker container exec 3a6 cat /opt/tibco/bw/5.13/bin/25.log
1524076184204 25: 23 [Attach API initializer]: AttachHandler initialize
1524076184211 25: 23 [Attach API initializer]: IPC Directory=/tmp/.com_ibm_tools_attach
1524076184212 25: 23 [Attach API initializer]: createDirectoryAndSemaphore /tmp/.com_ibm_tools_attach
1524076184214 25: 23 [Attach API initializer]: non-blocking locking file /tmp/.com_ibm_tools_attach/_master
1524076184216 25: 23 [Attach API initializer]: deleteStaleDirectories checking _master
1524076184217 25: 23 [Attach API initializer]: AttachHandler obtained master lock
1524076184221 25: 23 [Attach API initializer]: locking file /tmp/.com_ibm_tools_attach/_attachlock
1524076184224 25: 23 [Attach API initializer]: createAdvertisementFile /tmp/.com_ibm_tools_attach/25/attachInfo
1524076184225 25: 23 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_attachlock
1524076184226 25: 23 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_master
1524076184228 25: 25 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entering
1524076184229 25: 25 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entered
1524076184230 25: 25 [Attach API wait loop]: iteration 0 waitForNotification starting wait
1524076335212 25: 25 [Attach API wait loop]: iteration 0 waitForNotification ended wait
1524076335214 25: 25 [Attach API wait loop]: 0 connectToAttacher reply on port 35543
1524076335217 25: 25 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 0 waitForNotification obtainLock
1524076335218 25: 25 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076335219 25: 93 [Attachment 35543]: Attachment run
1524076335220 25: 25 [Attach API wait loop]: lock failed, trying blocking lock
1524076335229 25: 93 [Attachment 35543]: connectToAttacher localPort=42672 remotePort=35543
1524076335232 25: 93 [Attachment 35543]: streamSend ATTACH_CONNECTED 3ee8072a 
1524076335236 25: 25 [Attach API wait loop]: Blocking lock succeeded
1524076335236 25: 25 [Attach API wait loop]: iteration 0 checkReplyAndCreateAttachment releaseLock
1524076335237 25: 25 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076335237 25: 25 [Attach API wait loop]: closing /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076335237 25: 93 [Attachment 35543]: doCommand ATTACH_GETAGENTPROPERTIES
1524076335344 25: 93 [Attachment 35543]: doCommand ATTACH_DETACH
1524076335346 25: 93 [Attachment 35543]: streamSend ATTACH_DETACHED
1524076336239 25: 25 [Attach API wait loop]: iteration 1 waitForNotification ignoreNotification entering
1524076336240 25: 25 [Attach API wait loop]: iteration 1 waitForNotification ignoreNotification entered
1524076336240 25: 25 [Attach API wait loop]: iteration 1 waitForNotification starting wait
1524076336240 25: 25 [Attach API wait loop]: iteration 1 waitForNotification ended wait
1524076336241 25: 25 [Attach API wait loop]: 1 connectToAttacher reply on port 40657
1524076336242 25: 25 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 1 waitForNotification obtainLock
1524076336242 25: 95 [Attachment 40657]: Attachment run
1524076336245 25: 95 [Attachment 40657]: connectToAttacher localPort=54314 remotePort=40657
1524076336246 25: 95 [Attachment 40657]: streamSend ATTACH_CONNECTED 6ba9c1cb 
1524076336248 25: 25 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076336249 25: 25 [Attach API wait loop]: lock failed, trying blocking lock
1524076336250 25: 25 [Attach API wait loop]: Blocking lock succeeded
1524076336250 25: 25 [Attach API wait loop]: iteration 1 checkReplyAndCreateAttachment releaseLock
1524076336251 25: 25 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076336252 25: 95 [Attachment 40657]: doCommand ATTACH_GETAGENTPROPERTIES
1524076336252 25: 25 [Attach API wait loop]: closing /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076336340 25: 95 [Attachment 40657]: doCommand ATTACH_DETACH
1524076336341 25: 95 [Attachment 40657]: streamSend ATTACH_DETACHED
1524076337253 25: 25 [Attach API wait loop]: iteration 2 waitForNotification ignoreNotification entering
1524076337255 25: 25 [Attach API wait loop]: iteration 2 waitForNotification ignoreNotification entered
1524076337257 25: 25 [Attach API wait loop]: iteration 2 waitForNotification starting wait
1524076341925 25: 25 [Attach API wait loop]: iteration 2 waitForNotification ended wait
1524076341926 25: 25 [Attach API wait loop]: 2 connectToAttacher reply on port 33093
1524076341927 25: 96 [Attachment 33093]: Attachment run
1524076341927 25: 25 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 2 waitForNotification obtainLock
1524076341928 25: 25 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076341928 25: 96 [Attachment 33093]: connectToAttacher localPort=39448 remotePort=33093
1524076341929 25: 96 [Attachment 33093]: streamSend ATTACH_CONNECTED 472cf783 
1524076341931 25: 25 [Attach API wait loop]: lock failed, trying blocking lock
1524076341936 25: 25 [Attach API wait loop]: Blocking lock succeeded
1524076341936 25: 25 [Attach API wait loop]: iteration 2 checkReplyAndCreateAttachment releaseLock
1524076341937 25: 25 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076341937 25: 25 [Attach API wait loop]: closing /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076341938 25: 96 [Attachment 33093]: doCommand ATTACH_GETAGENTPROPERTIES
1524076342024 25: 96 [Attachment 33093]: doCommand ATTACH_DETACH
1524076342026 25: 96 [Attachment 33093]: streamSend ATTACH_DETACHED
1524076342938 25: 25 [Attach API wait loop]: iteration 3 waitForNotification ignoreNotification entering
1524076342938 25: 25 [Attach API wait loop]: iteration 3 waitForNotification ignoreNotification entered
1524076342939 25: 25 [Attach API wait loop]: iteration 3 waitForNotification starting wait
1524076342939 25: 25 [Attach API wait loop]: iteration 3 waitForNotification ended wait
1524076342940 25: 25 [Attach API wait loop]: 3 connectToAttacher reply on port 46631
1524076342940 25: 97 [Attachment 46631]: Attachment run
1524076342941 25: 25 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 3 waitForNotification obtainLock
1524076342941 25: 25 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076342941 25: 97 [Attachment 46631]: connectToAttacher localPort=51716 remotePort=46631
1524076342942 25: 25 [Attach API wait loop]: lock failed, trying blocking lock
1524076342942 25: 97 [Attachment 46631]: streamSend ATTACH_CONNECTED e2fbae18 
1524076342947 25: 25 [Attach API wait loop]: Blocking lock succeeded
1524076342947 25: 25 [Attach API wait loop]: iteration 3 checkReplyAndCreateAttachment releaseLock
1524076342947 25: 25 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076342948 25: 25 [Attach API wait loop]: closing /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076342953 25: 97 [Attachment 46631]: doCommand ATTACH_START_LOCAL_MANAGEMENT_AGENT
1524076342953 25: 97 [Attachment 46631]: startLocalAgent
1524076342956 25: 97 [Attachment 46631]: com.sun.management.jmxremote.localConnectorAddress=service:jmx:rmi://127.0.0.1:10200/stub/rO0ABXNyAC5qYXZheC5tYW5hZ2VtZW50LnJlbW90ZS5ybWkuUk1JU2VydmVySW1wbF9TdHViAAAAAAAAAAICAAB4cgAaamF2YS5ybWkuc2VydmVyLlJlbW90ZVN0dWLp/tzJi+FlGgIAAHhyABxqYXZhLnJtaS5zZXJ2ZXIuUmVtb3RlT2JqZWN002G0kQxhMx4DAAB4cHczAApVbmljYXN0UmVmAAoxNzIuMTcuMC4yAAAn2N1ZdpGTkTMNe2nJCwAAAWLaBQHYgAIAeA==
1524076342956 25: 97 [Attachment 46631]: streamSend ATTACH_RESULT=service:jmx:rmi://192.168.100.132:10200/stub/rO0ABXNyAC5qYXZheC5tYW5hZ2VtZW50LnJlbW90ZS5ybWkuUk1JU2VydmVySW1wbF9TdHViAAAAAAAAAAICAAB4cgAaamF2YS5ybWkuc2VydmVyLlJlbW90ZVN0dWLp/tzJi+FlGgIAAHhyABxqYXZhLnJtaS5zZXJ2ZXIuUmVtb3RlT2JqZWN002G0kQxhMx4DAAB4cHczAApVbmljYXN0UmVmAAoxNzIuMTcuMC4yAAAn2N1ZdpGTkTMNe2nJCwAAAWLaBQHYgAIAeA==
1524076343043 25: 97 [Attachment 46631]: doCommand ATTACH_GETAGENTPROPERTIES
1524076343132 25: 97 [Attachment 46631]: doCommand ATTACH_DETACH
1524076343132 25: 97 [Attachment 46631]: streamSend ATTACH_DETACHED
1524076343949 25: 25 [Attach API wait loop]: iteration 4 waitForNotification ignoreNotification entering
1524076343951 25: 25 [Attach API wait loop]: iteration 4 waitForNotification ignoreNotification entered
1524076343952 25: 25 [Attach API wait loop]: iteration 4 waitForNotification starting wait
1524076447926 25: 25 [Attach API wait loop]: iteration 4 waitForNotification ended wait
1524076447927 25: 25 [Attach API wait loop]: connectToAttacher 4 waitForNotification no reply file
1524076447927 25: 25 [Attach API wait loop]: checkReplyAndCreateAttachment iteration 4 waitForNotification obtainLock
1524076447928 25: 25 [Attach API wait loop]: locking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076447928 25: 25 [Attach API wait loop]: iteration 4 checkReplyAndCreateAttachment releaseLock
1524076447929 25: 25 [Attach API wait loop]: unlocking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076447929 25: 25 [Attach API wait loop]: closing /tmp/.com_ibm_tools_attach/25/attachNotificationSync
1524076447930 25: 25 [Attach API wait loop]: IOException unlocking file /tmp/.com_ibm_tools_attach/25/attachNotificationSync
java.nio.channels.ClosedChannelException
    at sun.nio.ch.FileLockImpl.release(java.base@9.0.4-internal/FileLockImpl.java:58)
    at com.ibm.tools.attach.target.FileLock.unlockFile(java.base@9.0.4-internal/FileLock.java:130)
    at com.ibm.tools.attach.target.WaitLoop.checkReplyAndCreateAttachment(java.base@9.0.4-internal/WaitLoop.java:135)
    at com.ibm.tools.attach.target.WaitLoop.waitForNotification(java.base@9.0.4-internal/WaitLoop.java:115)
    at com.ibm.tools.attach.target.WaitLoop.run(java.base@9.0.4-internal/WaitLoop.java:152)
1524076448932 25: 25 [Attach API wait loop]: iteration 5 waitForNotification ignoreNotification entering
1524076448933 25: 25 [Attach API wait loop]: iteration 5 waitForNotification ignoreNotification entered
1524076448934 25: 25 [Attach API wait loop]: iteration 5 waitForNotification starting wait
root@user-Aspire-ES1-431:~# 

Returned RMI address is of stub form: service:jmx:rmi://127.0.0.1/stub/rO0ABXNyAC5qYXZheC5tYW5hZ2VtZW50LnJlbW90ZS5ybWkuUk1JU2VydmVySW1wbF9TdHViAAAAAAAAAAICAAB4cgAaamF2YS5ybWkuc2VydmVyLlJlbW90ZVN0dWLp/tzJi+FlGgIAAHhyABxqYXZhLnJtaS5zZXJ2ZXIuUmVtb3RlT2JqZWN002G0kQxhMx4DAAB4cHc4AApVbmljYXN0UmVmAA8xOTIuMTY4LjEwMC4xMzIAAKjrq6UxzCqdRssPFdVZAAABYtC1HbGAAgB4

After base64 -d: ��sr.javax.management.remote.rmi.RMIServerImpl_Stubxrjava.rmi.server.RemoteStub���ɋ�exrjava.rmi.server.RemoteObject�a�� a3xpw8 UnicastRef192.168.100.132�뫥1�*�F��Ybе��x

Would it be possible to pass object implementing specific access?

pdbain-ibm commented 6 years ago

The basic attach mechanism works. I verified by attaching and getting the system properties. This may be an issue with RMI.

pdbain-ibm commented 6 years ago

@kubaseai Have you tested this using OpenJDK with Hotspot? Also, are you running jconsole in the docker container?

kubaseai commented 6 years ago

Hello Peter, I would like to describe some specific use case: 100 docker containers with Tibco BusinessWorks 5.13 (process server of Enterpise Service Bus) running on OpenJ9. BW exposes via JMX own bean to monitor running processes. In case of more that 100 containers running with replicas standard way of enabling JMX would require own TCP/IP port per container instance. With the patch from my first comment it is possible to do however not very efficient in administration. So I thought about specialized RMISocketFactory using files in /tmp directory.

Now back to the question. Sun Attach API implementation uses kill(pid) (as strace shows) in process of connecting to target JVM. It doesn't work from docker server into docker container because container's pid doesn't exist in server process space (or is totally different process) due to LXC isolation. IBM implementation uses semaphores, so with --ipc=host we've got workaround. The only remaining issue is definition of localhost for jconsole delivered by JVM in container.

OpenJ9 needs some small tweaks to be accessible with Attach API from docker server without the need for TCP/IP mapping. I'm very close to have working PoC.

kubaseai commented 6 years ago

JConsole sees stub: ��sr.javax.management.remote.rmi.RMIServerImpl_Stubxrjava.rmi.server.RemoteStub���ɋ�exrjava.rmi.server.RemoteObject�a�� a3xpw UnicastRef2 172.17.0.3sr-sun.management.jmxremote.FileRMISocketFactoryL acceptDirtLjava/lang/String;Lfbcst7Lsun/management/jmxremote/FileBasedCommunicationServer;xpt/tmp/.jmx/25@b336bb3cdfe7pwAga�w����b���[�x

pshipton commented 6 years ago

Note OpenJ9 can only accept changes to OpenJ9 code. rmi issues should be addressed at OpenJDK, and OpenJ9 builds will pick up any OpenJDK updates.

kubaseai commented 6 years ago

Thanks for info. I will put current patches on my github, but for OpenJDK we need brand new implementation of Attach API and RMI working with docker shared volumes. In case of > 100 docker containers TCP/IP is hard to accept for JMX monitoring especially with replicas.

pdbain-ibm commented 6 years ago

Please note also that attach API is not intended to allow attachments from foreign systems, such as other containers, virtual machines, or hosts.

pshipton commented 6 years ago

attach API is not intended to allow attachments from foreign systems, such as other containers, virtual machines, or hosts.

@pdbain-ibm This Issue as a proposal to improve on this. Do you have any comments about the getOverridenLocalHostName() proposal or alternative suggestions?

pdbain-ibm commented 6 years ago

I will take a look.

kubaseai commented 6 years ago

sun_management_jmxremote.zip

rmi1 rmi2 rmi10

I've got working FileRMISocketFactory and I'm able to connect with jconsole using files in /tmp (volume). In case we've got many containers from the same image there is a huge probability that JVM process get the same pid for every container. I tried to set com.ibm.tools.attach.id per JVM, but when I set -Dcom.ibm.tools.attach.id=bw_time_1 jconsole doesn't see this VirtualMachine.

pdbain-ibm commented 6 years ago

Nice work.

There are some complications.

  1. The JVM cleans up the attach directory on launch by checking if each advertisement file corresponds to an active process. If there is no active process, it removes the associated file artifacts. VMs on foreign docker images won't be visible and will have their files erased. There is a process to re-create them but it is limited.

  2. The VM ID generation scheme is based on process IDs. The algorithm for resolving collisions is designed for handling files left over from dead processes.

If you list the VMs using the attached code, what do you see? listvms.zip

I suggest we think about the intended use case and work from there. Do you want to be able to attach from a host machine to VMs in docker images? I think we can develop an algorithm which handles that case reliably.

kubaseai commented 6 years ago

Thanks for the compiled class. I set ID with -Dcom.ibm.tools.attach.id=container_123456789_bw_time. Checking for active process IPC.processExists(pid) can be extended to IPC.exists(pid, directoryEntryName) where directoryEntryName would be container_123456789_bw_time. From this string a docker container id can be extracted.

IPC.zip

java -Dcom.ibm.tools.attach.logging=yes -Dcom.ibm.tools.attach.containerExistsCmd="echo 1" listvms looking for attach targets id: container_123456789_bw_time name: container_123456789_bw_time id: 16217 name: listvms listVms:exit

1525803415444 16217: 17 [Attach API initializer]: AttachHandler initialize 1525803415450 16217: 17 [Attach API initializer]: IPC Directory=/tmp/.com_ibm_tools_attach 1525803415452 16217: 17 [Attach API initializer]: createDirectoryAndSemaphore /tmp/.com_ibm_tools_attach 1525803415455 16217: 17 [Attach API initializer]: non-blocking locking file /tmp/.com_ibm_tools_attach/_master 1525803415460 16217: 17 [Attach API initializer]: deleteStaleDirectories checking container_123456789_bw_time 1525803415467 16217: 17 [Attach API initializer]: getPidFromFile pid = 27container_123456789_bw_time 1525803415468 16217: 17 [Attach API initializer]: getPidFromFile uid = 1000 1525803415468 16217: 17 [Attach API initializer]: deleteStaleDirectories checking _master 1525803415470 16217: 17 [Attach API initializer]: deleteStaleDirectories checking _notifier 1525803415470 16217: 17 [Attach API initializer]: deleteStaleDirectories checking _attachlock 1525803415471 16217: 17 [Attach API initializer]: AttachHandler obtained master lock 1525803415476 16217: 17 [Attach API initializer]: locking file /tmp/.com_ibm_tools_attach/_attachlock 1525803415481 16217: 17 [Attach API initializer]: createAdvertisementFile /tmp/.com_ibm_tools_attach/16217/attachInfo 1525803415482 16217: 17 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_attachlock 1525803415483 16217: 17 [Attach API initializer]: unlocking file /tmp/.com_ibm_tools_attach/_master 1525803415485 16217: 19 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entering 1525803415486 16217: 19 [Attach API wait loop]: iteration 0 waitForNotification ignoreNotification entered 1525803415487 16217: 19 [Attach API wait loop]: iteration 0 waitForNotification starting wait 1525803415530 16217: 1 [main]: locking file /tmp/.com_ibm_tools_attach/_master 1525803415622 16217: 1 [main]: containerExists 123456789=true 1525803415624 16217: 1 [main]: unlocking file /tmp/.com_ibm_tools_attach/_master 1525803415643 16217: 18 [Attach API teardown]: shutting down attach API 1525803415644 16217: 18 [Attach API teardown]: AttachHandler terminate: Attach API is being shut down 1525803415645 16217: 18 [Attach API teardown]: AttachHandler terminate removing contents of directory : /tmp/.com_ibm_tools_attach/16217 1525803415646 16217: 18 [Attach API teardown]: deleting my files 1525803415649 16217: 18 [Attach API teardown]: non-blocking locking file /tmp/.com_ibm_tools_attach/_master 1525803415650 16217: 18 [Attach API teardown]: AttachHandler terminate obtained master lock 1525803415651 16217: 18 [Attach API teardown]: notifyVm 3 targets 1525803415652 16217: 18 [Attach API teardown]: unlocking file /tmp/.com_ibm_tools_attach/_master 1525803415653 16217: 18 [Attach API teardown]: AttachHandler terminate released master lock 1525803415653 16217: 19 [Attach API wait loop]: iteration 0 waitForNotification ended wait 1525803415654 16217: 19 [Attach API wait loop]: iteration 0 waitForNotification cancelNotify 1525803415655 16217: 18 [Attach API teardown]: deleting my directory 1525803415656 16217: 18 [Attach API teardown]: AttachHandler closed semaphore

I need to check JConsole code why it doesn't like my virtual machine with customized ID.

I want to be able to attach from host to VMs in docker containers. Maybe also from one dedicated container with exposed extended range of ports to other containers. Currently we have ServerSocket(0) and we can't expose all 64K ports to host.

kubaseai commented 6 years ago

LocalVirtualMachine.zip

Sun assumed vmid must be pid and doesn't support string in JConsole. Fixed. So I've got first docker friendly JDK. docker_j9_1 docker_j9_2

kubaseai commented 6 years ago

https://dzone.com/articles/codetalk-red-hat-cto-on-jakarta-ee-cloud-native-ku I think we are waiting for big players to implement docker/container features. If someone wants to patch OpenJ9 on their own here is described concept: https://medium.com/@jakub.jozwicki/docker-friendly-enterprise-java-51cac8417af8.

ashu-mehra commented 6 years ago

I recently tried OpenJDK build for Java 11 and it seems you can connect to a JVM running inside a container from the host using attach api. There are couple of issue related to that in openjdk: https://bugs.openjdk.java.net/browse/JDK-8179498 https://bugs.openjdk.java.net/browse/JDK-8193710

I think it would be good to have this kind of support in OpenJ9 as well, as it would help in JVM monitoring in cloud environments.

Currently for a single docker container running an OpenJ9 JVM, we can use attach api if we start the docker container with --network=host --ipc=host options and bind mount host's /tmp directory. I think --ipc=hostoption would any be required to allow container to use host system’s IPC namespace for the semaphore used by attach api. But --network=host is considered insecure as per https://docs.docker.com/engine/reference/run/#network-settings

Note: --network="host" gives the container full access to local system services such as D-bus and is therefore considered insecure.

I think providing complete attach api support for JVMs in containers would require re-looking at different aspects - advertisement, discovery and communication.

ashu-mehra commented 6 years ago

I did code review of attach api and tried to see the problems that can arise in connecting to a JVM running in container from the host. @kubaseai already covered many of these in the comments above. Summarizing the issues here: 1) Current discovery mechanism relies on common directory (by default it is /tmp/.com_ibm_tools_attach) accessible to both target JVM and client JVM. During startup target JVM would have created a directory using a PID as the name under this common directory and advertised its details in a file /tmp/.com_ibm_tools_attach/PID/attachInfo file. Client would iterate through all the entries in common directory and read the attachInfo file to get list of the target JVMs. This would not work in containers as the filesystem of containers is different than the host, unless containers bind mount a host's directory and JVM use that as the common directory. Even with bind mount, there is another problem with the use of PID as the VM id which is used for creating the directory for the advertisement file. Multiple containers may be running on the host, in which case JVMs inside different containers may have same PID and hence same VM id. TargetDirectory.createMyDirectory() actually takes care of this situation by appending a counter to the PID to generate a unique VM id (of the form PID_) if the directory with same name as JVM's PID already exists. But there is a limit of 100 for this counter, which essentially restricts the number of container that can be run on same host with attach api support.

2) Next problem comes from the client side. While discovering all the VMs in a common directory (as in AttachProvider.listVirtualMachines()), the client gets the PID of the target from the advertisement file and checks if the process exists or not. Because of different PID namespace, PID of the JVM in the container would not be same as PID of the JVM in the host. We need some mechanism to map PID in the container to PID in the host.

3) Next problem is closely similar to previous one. To attach to a specific JVM, client is provided the VM id of the target JVM. The client first gets the list of all JVMs and then uses this VM id to filter out the target JVM. As stated before, by default the VM id is the PID of the JVM. Now, if the client uses PID of the target JVM on the host as the VM id, then it won't be able to locate that JVM in the list since its VM id is its PID in the container.

4) Lastly, be default host and container use different network namespace. So using InetAddress.getLoopbackAddress() for talking to JVM in container would not work.

5) AttachAPI is using semaphores to send notification to the target JVM(s) when a client wants to connect to them. Again, by default container and host have different IPC namespace, so this mechanism won't work for JVMs running in container and host.

Workarounds/Solutions

There are some workarounds to handle these problems like using --netowrk=host docker option to fix problem 4 or using bind mounts, but they don't address problems 2 and 3.

There is actually a way to look into container's file system by using /proc/PID/root which is a symbolic link to the process's root directory. So one way to handle these problems is to scan the entries in /proc fs and use /proc/PID/root/ instead of / as the root of the common directory to discover JVMs running inside a container. This way no bind mounting is required. Also the presence of its entry in the /proc should be enough to conclude the process is active and running, and the client wouldn't need to check for target's process existence explicitly.

To handle problem 4, I think the changes in comment to add a new property com.ibm.tools.attach.target.hostname and using InetAddress.getByName() make sense.

For problem 5, the user would have to start the docker container using --ipc=host to share the IPC namespace between the host and the container, which should not be much of a concern.