datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
169 stars 84 forks source link

clarify/document threading implictations in datajoint (was: pycharm console issues) #815

Open ixcat opened 4 years ago

ixcat commented 4 years ago

It appears that (at least) printing a table heading result to the pycharm console creates some sort of problematic state whereby all subsequent connections to mysql are terminated nearly immediately. Digging further it appears pycharm is running the system inside 'PyDev.Debugger' via $pycharm_root/plugins/python-ce/helpers/pydev/pydevconsole.py which in turn is a client-server system apparently using multithreading/XMLRPC.

I suspect something here is not 'playing nice' with the mysql connections or perhaps the socket library / open file descriptors which causes the issue, however isolating the code responsible and digging further is a fairly complicated debugging problem. Console vs excecution fd's or something juggling these within exception handlers might also be implicated (console output appears to be scrambed/repeated after interacting with an interpreter in this 'broken state')

Likely, a simple test could be built using PyDev.Debugger which triggers the issue, or can built as working and gradually be expanded to match the code used by pycharm until the issue is triggered. Unfortunately running PyDev.Debugger standalone does not seem to be well documented. Likely something based on this unit test would be a good starting point:

https://github.com/fabioz/PyDev.Debugger/blob/master/tests/test_pydevconsole.py

minimal trigger script which expects an 'acquisition.Session' to follow; should also work (e.g. fail) with a quick revamp of an inline table definition

environment: linux amd64 / ubuntu 20.04 / miniconda / python 3.8.5 / pycharm community 2020.2.3

ixcat commented 4 years ago

import datajoint as dj

def doq(c):
    return c.query('select connection_id();').fetchall()

acq = dj.create_virtual_module('acq', 'ibl_acquisition')

def dook():
    for x in range(3):
        print(len(acq.Session()))

def dofail():
    print(acq.Session())

# 'dook()' seems to work N times;
# after 'dofail()', 'dook()' will never work;
# even dj.conn().connect() will connect and then reset connection;
# (as seen via packet dump)
# calling doq(dj.conn()) after connect demonstrates the issue as well.
ixcat commented 4 years ago

will investigate https://plugins.jetbrains.com/plugin/9409-send-to-terminal as temporary workaround while debug continues

ixcat commented 3 years ago

update on send-to-terminal workaround: send-to-terminal is not whitespace aware so blocks with blank lines fail when pasted due to IndentationError (was designed primarily for other languages). Also, the plugin is only implemented for windows/mac.

similar issues were found when running populate in a thread (independently from PyCharm); suspect the muddled socket descriptor issue is common across threading and triggered by PyCharms client-server Python REPL.

Digging further on PyMySQL, it is apparently DBAPI 'thread safe level 1: Threads may share the module, but not connections. per: https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/__init__.py (threadsafe=1), and spec at: http://legacy.python.org/dev/peps/pep-0249/#threadsafety. I suspect digging into our use of the shared connection object might bear some fruit here.

ixcat commented 3 years ago

adjusted title to document/understand threading due to similar issue/reproduction; when understood/clarified can swing back to revisit pycharm implications.

ixcat commented 3 years ago

deassigning since this is not being actively worked