Closed jmao-denver closed 3 years ago
@chipkent please add your comments.
Ideally, support an IDE debugger from VS Code or PyCharm. If that is too difficult, support for pdb
would be helpful.
VS Code supports remote development and debugging in containers. It may be possible to support our use case via that mechanism.
Currently JetBrain's Intellij is the standard Java IDE for DHC and since PyCharm comes from the same company and they share the same look and feel and the same feature set and keyboard shortcuts etc., I have decided to focus on PyCharm because I expect myself to use whatever the final solution we will come up with. Since there isn't a very obvious or easy way to do this (otherwise we would have done it a lot time ago), and a lot depends on how Python remote debugging is supported in PyCharm, this has become more of an exploratory task. After having struggled for a couple of days, I have come to a manual solution which isn't quite ready for users outside DH just yet. As disappointing as it is, It is worthwhile to document what I have done here and keep our eyes open for future development in this area which could lead to a better solution.
I mostly used this page https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html in my research and experiment. As you can see there are two ways to debug a Python script on a remote Python host environment. (BTW 'remote debugging capability' is only available in PyCharm Pro which requires a paid subscription for most people.)
In short, you need only configure the remote SSH interpreter and PyCharm takes care of the rest, and it allows you debug Python code just like you would on a locally configured interpreter.
So far it doesn't look so bad to use the remote debugging feature in PyCharm Pro, but what about remote debugging an embedded Python interpreter as in the case of DH which uses the JPY bridge to allow its Java-based server to run a Python script? After having spent quite some time trying different things, I have come to the unfortunate realization that it simply isn't possible with PyCharm Pro. Specifically, for method 1, the Python script needs to be launched by the remote SSH interpreter directly; for method 2, PyCharm requires a Python script file, it doesn't know what to do with Python code executed as a script string and would complain about not being able to find the file.
In short, it seems that the effort to come up with an easy and seamless solution for debugging Python script inside DH Java server is premature. There is a workaround that follows what we have done to enable running Python integration tests in the docker environment but it requires creating a new Dockerfile to install SSH server in the grpc-api image and a bootstrap script to initialize JPY and create a Python script session directly without running the DH server. At this moment, without actual customer demand, it doesn't seem committing these changes and possibly automating them is a worthwhile investment. So I will simply attach the files here for future reference.
Note: I use these to set up my own server-side Python project dev environment as I will need to write a lot of Python wrappers and test cases, therefore, such a one-time effort is totally worthwhile.
bootstrap.py
import os
from deephaven import start_jvm, jpy
def build_py_session():
if not jpy.has_jvm():
DEFAULT_DEVROOT = os.environ.get('DEEPHAVEN_DEVROOT', "/tmp/pyintegration")
DEFAULT_WORKSPACE = os.environ.get('DEEPHAVEN_WORKSPACE', "/tmp")
DEFAULT_PROPFILE = os.environ.get('DEEPHAVEN_PROPFILE', 'dh-defaults.prop')
DEFAULT_CLASSPATH = os.environ.get('DEEPHAVEN_CLASSPATH', "/app/classese/*:/app/libs/*")
os.environ['JAVA_VERSION'] = '1.8'
os.environ['JDK_HOME'] = '/usr/lib/jvm/zulu8/jre/'
# we will try to initialize the jvm
kwargs = {
'workspace': DEFAULT_WORKSPACE,
'devroot': DEFAULT_DEVROOT,
'verbose': False,
'propfile': DEFAULT_PROPFILE,
'java_home': os.environ.get('JDK_HOME', None),
'jvm_properties': {},
'jvm_options': {'-Djava.awt.headless=true',
'-Xms1g',
'-Xmn512m',
# '-verbose:gc', '-XX:+PrintGCDetails',
},
'jvm_maxmem': '1g',
'jvm_classpath': DEFAULT_CLASSPATH,
'skip_default_classpath': True
}
# initialize the jvm
start_jvm(**kwargs)
# set up a Deephaven Python session
py_scope_jpy = jpy.get_type("io.deephaven.db.util.PythonScopeJpyImpl").ofMainGlobals()
py_dh_session = jpy.get_type("io.deephaven.db.util.PythonDeephavenSession")(py_scope_jpy)
jpy.get_type("io.deephaven.db.tables.select.QueryScope").setScope(py_dh_session.newQueryScope())
Dockerfile4dbg
FROM deephaven/runtime-base:local-build
LABEL maintainer="Devin Smith \"devinsmith@deephaven.io\""
WORKDIR /app
COPY libs libs/
COPY resources resources/
COPY classes classes/
#ENTRYPOINT ["java", "-server", "-XX:+UseG1GC", "-XX:MaxGCPauseMillis=100", "-XX:+UseStringDeduplication", "-XX:InitialRAMPercentage=25.0", "-XX:MinRAMPercentage=70.0", "-XX:MaxRAMPercentage=80.0", "-XshowSettings:vm", "-cp", "/app/resources:/app/classes:/app/libs/*", "io.deephaven.grpc_api.runner.Main"]
EXPOSE 8080
COPY licenses/ /
VOLUME ["/data"]
VOLUME ["/cache"]
# set up SSH in the container to enable remote debugging in PyCharm
RUN apt update && apt install openssh-server sudo -y
RUN useradd -rm -d /home/test -s /bin/bash -g root -G sudo -u 1000 test
RUN echo 'test:test' | chpasswd
RUN echo "Defaults lecture = never" > /etc/sudoers.d/privacy
RUN service ssh start
EXPOSE 22
COPY grpc-api-dbg-entry.sh /grpc-api-dbg-entry.sh
RUN chmod +x /grpc-api-dbg-entry.sh
#CMD ./grpc-api-dbg-entry.sh
ENTRYPOINT ["/grpc-api-dbg-entry.sh"]
grpc-api-dbg-entry.sh
#!/bin/bash
java -server -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+UseStringDeduplication -XX:InitialRAMPercentage=25.0 -XX:MinRAMPercentage=70.0 -XX:MaxRAMPercentage=80.0 -XshowSettings:vm -cp /app/resources:/app/classes:/app/libs/* io.deephaven.grpc_api.runner.Main &
/usr/sbin/sshd -D
When configure the SSH interpreter in PyCharm, the connection should be 'localhost', port 22. The login and password must be 'test' and 'test' but you can change the Dockerfile to create whatever login and password that work for you. The mapped path is '/tmp/pyintegration' that is dictated by the bootstrap.py but it can be changed too to fit your own situation.
docker-compose-common.yml
version: "3.4"
services:
grpc-api:
image: deephaven/grpc-api:local-build
environment:
# https://bugs.openjdk.java.net/browse/JDK-8230305
# cgroups v2 resource reservations only work w/ java 15+ ATM, so it's best for our java processes to be explicit
# with max memory.
#
# To turn on debug logging, add: -Dlogback.configurationFile=logback-debug.xml
- JAVA_TOOL_OPTIONS=-Xmx4g -Ddeephaven.console.type=${DEEPHAVEN_CONSOLE_TYPE} -Ddeephaven.application.dir=${DEEPHAVEN_APPLICATION_DIR}
expose:
- '8080'
ports:
# - '5005:5005' # For remote debugging (change if using different port)
- '22:22'
# Note: using old-style volume mounts, so that the directories get created if they don't exist
# See https://docs.docker.com/storage/bind-mounts/#differences-between--v-and---mount-behavior
volumes:
- ./data:/data
deploy:
resources:
limits:
cpus: '4.0'
memory: 4500M
reservations:
memory: 1000M
# Allows the querying of this process jinfo/jmap
# docker-compose exec grpc-api jmap -heap 1
# docker-compose exec grpc-api jinfo 1
#
# Add NET_ADMIN to allow throttling network speeds
# $ docker exec -it core_grpc-api_1 apt-get install iproute2
# $ docker exec core_grpc-api_1 tc qdisc add dev eth0 root netem delay 10ms
cap_add:
- SYS_PTRACE
web:
image: deephaven/web:local-build
expose:
- "80"
volumes:
- ./data:/data
deploy:
resources:
limits:
cpus: '1'
memory: 256M
# Should only be used for non-production deployments, see grpc-proxy/README.md for more info
grpc-proxy:
image: deephaven/grpc-proxy:local-build
environment:
- BACKEND_ADDR=grpc-api:8080
expose:
- '8080'
# - '8443' #unused
deploy:
resources:
limits:
cpus: '1'
memory: 256M
envoy:
# A reverse proxy configured for no SSL on localhost. It fronts the requests
# for the static content and the websocket proxy.
image: deephaven/envoy:local-build
ports:
- "${PORT}:10000"
# - '9090:9090' #envoy admin
deploy:
resources:
limits:
cpus: '1'
memory: 256M
It has been a pain in the neck to debug a Python script invoked inside a DH Python session. Developers have to rely on the logs and Python's print() function to trouble-shoot a faulty script. If we expect users to write more sophisticated processing logic, we will need to make it easier to debug Python code running inside DH.
Because the DHCC server runs inside a container, the ability to remotely debug a running Python script is really critical for us to be more productive in our effort to wrap more and more DH Java code in Python to make it easier for not only developers but also data scientists and data engineers to access DH's powerful features.
This feature could be equally important to DH support engineers or anyone who needs to develop new capabilities in Python.
part of #1263