deephaven / deephaven-core

Deephaven Community Core
Other
249 stars 79 forks source link

Python debuggers do not work on code entered through the IDE #2997

Open chipkent opened 1 year ago

chipkent commented 1 year ago

In these two repositories, I have been working on getting a debugger working with DH. https://github.com/chipkent/test-dh-docker-pycharm https://github.com/chipkent/test-dh-docker-vscode

In both cases, the debugger works for code executed via a .py file, but it does not work for code entered via the DH IDE. As a specific example, code entered via the DH IDE does not trigger breakpoints.

On the positive side this stuff is confirmed to work in the debugger:

  1. Breakpoints in plain python files
  2. Breakpoints in code executed via exec
  3. Breakpoints in code called in Deephaven table.update

The easiest way to reproduce the problem is to check out https://github.com/chipkent/test-dh-docker-vscode and follow the directions to use devcontainers in the README. Once the devcontainer is started, add breakpoints in the code, run run_me.py under the debugger, and then execute code in the IDE that should trigger the breakpoints. Using this VS Code path to reproduce will be far less painful than using PyCharm, because of #2994.

See #2994

jmao-denver commented 1 year ago

@chipkent A while back, Colin and I were faced with the same issue and found this solution. Could you confirm that it works for you?

import pydevd
pydevd.settrace(suspend=False, trace_only_current_thread=False)

As you can see, you need to install pydevd in your (virtual) env first. Both PyCharm and VSCode reply on this package in their debuggers. We even talked about including it as a requirement for the DH server package and enable it by default to make the user experience better but decided to wait for actual user request first.

chipkent commented 1 year ago

VS Code uses debugpy which, under the covers, vendors pydevd. Using the above pydevd commands did not yield working breakpoints.

Some digging turned up this issue (https://github.com/microsoft/debugpy/issues/474), which suggests trying debugpy.debug_this_thread() to enable debugging in threads created in non-traditional ways (e.g. without using threading).

Riffing on this idea yielded some fruit. If I run:

x = mypkg.get_et()

in the console, no breakpoints are triggered. But if I run:

import debugpy; debugpy.debug_this_thread(); x = mypkg.get_et()

in the console, the relevant breakpoints are triggered.

Unfortunately, every execution from the console requires debugpy.debug_this_thread() for the console-initiated breakpoints to trigger. This potentially suggests:

  1. The DH console is creating threads in a non-traditional way.
  2. Every execution in the DH console is happening in a fresh thread.
  3. Printing out the thread ID from inside mypkg.get_et() shows the same thread id, so either (2) is incorrect, or python is reusing thread ids. This article indicates that thread ids are reused.
chipkent commented 1 year ago

Threads that seem to debug fine are _MainThread:

CALLING: update_func thread=281473046017056 thread=<_MainThread(MainThread, started 281473046017056)>

While threads that need debugpy.debug_this_thread() set are of type _DummyThread:

CALLING: get_et thread=281469194858784 thread=<_DummyThread(Dummy-7, started daemon 281469194858784)>
chipkent commented 1 year ago
print(type(threading.current_thread()))
<class 'threading._DummyThread'>

From the python threading docs:

"There is the possibility that “dummy thread objects” are created. These are thread objects corresponding to “alien threads”, which are threads of control started outside the threading module, such as directly from C code. Dummy thread objects have limited functionality; they are always considered alive and daemonic, and cannot be join()ed. They are never deleted, since it is impossible to detect the termination of alien threads."

niloc132 commented 1 year ago

@jmao-denver and I confirmed (at the time) that ensuring trace_only_current_thread is False worked on threads that started outside python (the executor that dh uses to run py snippets in the console, within the UGP).

We do explicitly create and delete those those thread states in jpy (on the way in/out of py from java).

Were you able to to try calling pydevd.settrace on the vendored impl? Given that each ide seems to roll its own, it is unlikely that we'll be able to uniformly handle this in dh or jpy.

chipkent commented 1 year ago

I have tried adding both of these to my test script. Neither resulted in the breakpoints being hit:

import pydevd
pydevd.settrace(suspend=False, trace_only_current_thread=False)
import debugpy
pydevd = debugpy._vendored.import_module("pydevd")
pydevd.settrace(suspend=False, trace_only_current_thread=False)
chipkent commented 1 year ago

https://code.visualstudio.com/docs/python/debugging

Says: If you're working with a multi-threaded app that uses native thread APIs (such as the Win32 CreateThread function rather than the Python threading APIs), it's presently necessary to include the following source code at the top of whichever file you want to debug:

import debugpy
debugpy.debug_this_thread()
chipkent commented 1 year ago

Further testing of table.update() indicates that only the initialization stops at the breakpoint. Future real-time update calls do not stop at the breakpoint without jumping through hoops to somehow call debugpy.debug_this_thread().

chipkent commented 1 year ago

I have continued to look at this problem and test suggestions. The following test cases attempt to debug scripts running in a Docker container with pip-installed Deephaven.

https://github.com/chipkent/test-dh-docker-pycharm is using the suggested trace_only_current_thread=False with PyCharm and pydevd. This does not appear to work for (a) code executed via the IDE or (b) code executed via update during real-time updates.

https://github.com/chipkent/test-dh-docker-pycharm is using VS Code and debugpy. Similarly this does not appear to work for (a) code executed via the IDE or (b) code executed via update during real-time updates. I have been unable to come up with a slick way to automatically call debugpy.debug_this_thread() on threading._DummyThreads.

chipkent commented 1 year ago

A video of the current debugger status: https://illumon.slack.com/archives/G01K6UHJ00K/p1666369487981179

devinrsmith commented 1 year ago

The DH console is creating threads in a non-traditional way.

Any java thread will be a "dummy" thread from the context of python (with potentially the exception of the main thread?).

For example, start up Deephaven 'normally' (java process that starts a python process), and print(type(threading.current_thread())) will likely print a dummy thread.

I'm not sure if there is a generic cpython way to solve this (something like, every thread that java creates needs to call into some sort of python threading hooks); or if this needs to be solved on a debugger by debugger basis.