Make it easy to use PyCharm to remotely debug Python code inside the Deephaven grpc-api container

jmao-denver commented 3 years ago

It has been a pain in the neck to debug a Python script invoked inside a DH Python session. Developers have to rely on the logs and Python's print() function to trouble-shoot a faulty script. If we expect users to write more sophisticated processing logic, we will need to make it easier to debug Python code running inside DH.

Because the DHCC server runs inside a container, the ability to remotely debug a running Python script is really critical for us to be more productive in our effort to wrap more and more DH Java code in Python to make it easier for not only developers but also data scientists and data engineers to access DH's powerful features.

This feature could be equally important to DH support engineers or anyone who needs to develop new capabilities in Python.

part of #1263

jmao-denver commented 3 years ago

@chipkent please add your comments.

chipkent commented 3 years ago

Ideally, support an IDE debugger from VS Code or PyCharm. If that is too difficult, support for pdb would be helpful.

VS Code supports remote development and debugging in containers. It may be possible to support our use case via that mechanism.

jmao-denver commented 3 years ago

Currently JetBrain's Intellij is the standard Java IDE for DHC and since PyCharm comes from the same company and they share the same look and feel and the same feature set and keyboard shortcuts etc., I have decided to focus on PyCharm because I expect myself to use whatever the final solution we will come up with. Since there isn't a very obvious or easy way to do this (otherwise we would have done it a lot time ago), and a lot depends on how Python remote debugging is supported in PyCharm, this has become more of an exploratory task. After having struggled for a couple of days, I have come to a manual solution which isn't quite ready for users outside DH just yet. As disappointing as it is, It is worthwhile to document what I have done here and keep our eyes open for future development in this area which could lead to a better solution.

I mostly used this page https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html in my research and experiment. As you can see there are two ways to debug a Python script on a remote Python host environment. (BTW 'remote debugging capability' is only available in PyCharm Pro which requires a paid subscription for most people.)

If the remote host supports SSH login, we can configure a remote SSH Python interpreter in PyCharm. The configured SSH Python interpreter points to a Python installation on the remote host. Then with some helper programs (most likely wrappers around PDB that allows SSH connections and redirect terminal input/output to such SSH connections), PyCharm allows you to debug a locally written Python script in the remote SSH Python interpreter, probably by first SFTP-ing it to the remote the server and then running it through the helper module it installed during the configuration of the SSH Python interpreter.

In short, you need only configure the remote SSH interpreter and PyCharm takes care of the rest, and it allows you debug Python code just like you would on a locally configured interpreter.

The other option is more manual, it involves configuring a Python Debug Server' in PyCharm. The Python Debug Server is a server that runs locally and listens on certain port for connection requests from remote host. To use this feature
- configure the Python Debug Server locally and launch it to wait for connection request from the remote host
- you'll need to modify your Python script to import the Python Debug Server package and use it to make a connection to the local IDE, and make it available on the remote host, either through remote copy or Docker volume mapping.
- then you need to get onto the remote machine and launch the target Python script from a terminal and when it reaches the code which invokes the connection attempt to the Python Debug Server ...
- in the local PyCharm, the debug windows will show the connection is established, and you now can debug in PyCharm the code which is actually executed remotely.

So far it doesn't look so bad to use the remote debugging feature in PyCharm Pro, but what about remote debugging an embedded Python interpreter as in the case of DH which uses the JPY bridge to allow its Java-based server to run a Python script? After having spent quite some time trying different things, I have come to the unfortunate realization that it simply isn't possible with PyCharm Pro. Specifically, for method 1, the Python script needs to be launched by the remote SSH interpreter directly; for method 2, PyCharm requires a Python script file, it doesn't know what to do with Python code executed as a script string and would complain about not being able to find the file.

In short, it seems that the effort to come up with an easy and seamless solution for debugging Python script inside DH Java server is premature. There is a workaround that follows what we have done to enable running Python integration tests in the docker environment but it requires creating a new Dockerfile to install SSH server in the grpc-api image and a bootstrap script to initialize JPY and create a Python script session directly without running the DH server. At this moment, without actual customer demand, it doesn't seem committing these changes and possibly automating them is a worthwhile investment. So I will simply attach the files here for future reference.

Note: I use these to set up my own server-side Python project dev environment as I will need to write a lot of Python wrappers and test cases, therefore, such a one-time effort is totally worthwhile.

jmao-denver commented 3 years ago

bootstrap.py

import os
from deephaven import start_jvm, jpy

def build_py_session():
    if not jpy.has_jvm():
        DEFAULT_DEVROOT = os.environ.get('DEEPHAVEN_DEVROOT', "/tmp/pyintegration")
        DEFAULT_WORKSPACE = os.environ.get('DEEPHAVEN_WORKSPACE', "/tmp")
        DEFAULT_PROPFILE = os.environ.get('DEEPHAVEN_PROPFILE',  'dh-defaults.prop')
        DEFAULT_CLASSPATH = os.environ.get('DEEPHAVEN_CLASSPATH', "/app/classese/*:/app/libs/*")
        os.environ['JAVA_VERSION'] = '1.8'
        os.environ['JDK_HOME'] = '/usr/lib/jvm/zulu8/jre/'

        # we will try to initialize the jvm
        kwargs = {
            'workspace': DEFAULT_WORKSPACE,
            'devroot': DEFAULT_DEVROOT,
            'verbose': False,
            'propfile': DEFAULT_PROPFILE,
            'java_home': os.environ.get('JDK_HOME', None),
            'jvm_properties': {},
            'jvm_options': {'-Djava.awt.headless=true',
                            '-Xms1g',
                            '-Xmn512m',
                            # '-verbose:gc', '-XX:+PrintGCDetails',
                            },
            'jvm_maxmem': '1g',
            'jvm_classpath': DEFAULT_CLASSPATH,
            'skip_default_classpath': True
        }
        # initialize the jvm
        start_jvm(**kwargs)

        # set up a Deephaven Python session
        py_scope_jpy = jpy.get_type("io.deephaven.db.util.PythonScopeJpyImpl").ofMainGlobals()
        py_dh_session = jpy.get_type("io.deephaven.db.util.PythonDeephavenSession")(py_scope_jpy)
        jpy.get_type("io.deephaven.db.tables.select.QueryScope").setScope(py_dh_session.newQueryScope())

jmao-denver commented 3 years ago

Dockerfile4dbg

FROM deephaven/runtime-base:local-build
LABEL maintainer="Devin Smith \"devinsmith@deephaven.io\""
WORKDIR /app
COPY libs libs/
COPY resources resources/
COPY classes classes/
#ENTRYPOINT ["java", "-server", "-XX:+UseG1GC", "-XX:MaxGCPauseMillis=100", "-XX:+UseStringDeduplication", "-XX:InitialRAMPercentage=25.0", "-XX:MinRAMPercentage=70.0", "-XX:MaxRAMPercentage=80.0", "-XshowSettings:vm", "-cp", "/app/resources:/app/classes:/app/libs/*", "io.deephaven.grpc_api.runner.Main"]
EXPOSE 8080
COPY licenses/ /
VOLUME ["/data"]
VOLUME ["/cache"]

# set up SSH in the container to enable remote debugging in PyCharm
RUN apt update && apt install  openssh-server sudo -y
RUN useradd -rm -d /home/test -s /bin/bash -g root -G sudo -u 1000 test
RUN echo 'test:test' | chpasswd
RUN echo "Defaults        lecture = never" > /etc/sudoers.d/privacy
RUN service ssh start
EXPOSE 22
COPY grpc-api-dbg-entry.sh /grpc-api-dbg-entry.sh
RUN chmod +x /grpc-api-dbg-entry.sh
#CMD ./grpc-api-dbg-entry.sh
ENTRYPOINT ["/grpc-api-dbg-entry.sh"]

jmao-denver commented 3 years ago

grpc-api-dbg-entry.sh

#!/bin/bash
java -server -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+UseStringDeduplication -XX:InitialRAMPercentage=25.0 -XX:MinRAMPercentage=70.0 -XX:MaxRAMPercentage=80.0 -XshowSettings:vm -cp /app/resources:/app/classes:/app/libs/* io.deephaven.grpc_api.runner.Main &

/usr/sbin/sshd -D

jmao-denver commented 3 years ago

When configure the SSH interpreter in PyCharm, the connection should be 'localhost', port 22. The login and password must be 'test' and 'test' but you can change the Dockerfile to create whatever login and password that work for you. The mapped path is '/tmp/pyintegration' that is dictated by the bootstrap.py but it can be changed too to fit your own situation.

jmao-denver commented 3 years ago

docker-compose-common.yml

version: "3.4"

services:
  grpc-api:
    image: deephaven/grpc-api:local-build

    environment:
      # https://bugs.openjdk.java.net/browse/JDK-8230305
      # cgroups v2 resource reservations only work w/ java 15+ ATM, so it's best for our java processes to be explicit
      # with max memory.
      #
      # To turn on debug logging, add: -Dlogback.configurationFile=logback-debug.xml
      - JAVA_TOOL_OPTIONS=-Xmx4g -Ddeephaven.console.type=${DEEPHAVEN_CONSOLE_TYPE} -Ddeephaven.application.dir=${DEEPHAVEN_APPLICATION_DIR}

    expose:
      - '8080'
    ports:
#      - '5005:5005'        # For remote debugging (change if using different port)
      - '22:22'
    # Note: using old-style volume mounts, so that the directories get created if they don't exist
    # See https://docs.docker.com/storage/bind-mounts/#differences-between--v-and---mount-behavior
    volumes:
      - ./data:/data

    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 4500M
        reservations:
          memory: 1000M

    # Allows the querying of this process jinfo/jmap
    # docker-compose exec grpc-api jmap -heap 1
    # docker-compose exec grpc-api jinfo 1
    #
    # Add NET_ADMIN to allow throttling network speeds
    # $ docker exec -it core_grpc-api_1 apt-get install iproute2
    # $ docker exec core_grpc-api_1 tc qdisc add dev eth0 root netem delay 10ms
    cap_add:
      - SYS_PTRACE

  web:
    image: deephaven/web:local-build
    expose:
      - "80"
    volumes:
      - ./data:/data
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 256M

  # Should only be used for non-production deployments, see grpc-proxy/README.md for more info
  grpc-proxy:
    image: deephaven/grpc-proxy:local-build
    environment:
      - BACKEND_ADDR=grpc-api:8080
    expose:
      - '8080'
#      - '8443' #unused
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 256M

  envoy:
    # A reverse proxy configured for no SSL on localhost. It fronts the requests
    # for the static content and the websocket proxy.
    image: deephaven/envoy:local-build
    ports:
      - "${PORT}:10000"
#      - '9090:9090' #envoy admin
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 256M

deephaven / deephaven-core

Make it easy to use PyCharm to remotely debug Python code inside the Deephaven grpc-api container #1302