facebookresearch / labgraph

LabGraph is a Python framework for rapidly prototyping experimental systems for real-time streaming applications. It is particularly well-suited to real-time neuroscience, physiology and psychology experiments.
MIT License
161 stars 48 forks source link

Pytest causes bus error on test_hang. #50

Open pperanich opened 2 years ago

pperanich commented 2 years ago

🐛 Bug

When running pytest on the whole Labgraph module, via python -m pytest --pyargs -v labgraph --ignore=labgraph/devices, the test labgraph/runner/tests/test_exception.py::test_hang fails on [ProcessPhase.STOPPING] every time with a SIGABRT. However, running these tests alone, via python -m pytest --pyargs -v labgraph.runners.tests.test_exception, passes.

To Reproduce

Steps to reproduce the behavior:

  1. Build docker image using Dockerfile, e.g., docker build . -t labgraph
  2. Run inside tests inside a container, docker run -it labgraph bash
  3. Run the test suite with gdb debugging, gdb -ex r --args python3.9 -m pytest --pyargs -v labgraph --ignore=labgraph/devices

Since I ran with GDB, I was able to run a backtrace after the SIGABRT (find log file attached below):

labgraph/runners/tests/test_exception.py::test_hang[ProcessPhase.STARTING] [Detaching after fork from child process 2271]
[Detaching after fork from child process 2272]
PASSED                                                                                 [ 72%]
labgraph/runners/tests/test_exception.py::test_hang[ProcessPhase.READY] [Detaching after fork from child process 2339]
[Detaching after fork from child process 2340]
PASSED                                                                                    [ 73%]
labgraph/runners/tests/test_exception.py::test_hang[ProcessPhase.RUNNING] [Detaching after fork from child process 2407]
[Detaching after fork from child process 2408]
PASSED                                                                                  [ 73%]
labgraph/runners/tests/test_exception.py::test_hang[ProcessPhase.STOPPING] [Detaching after fork from child process 2490]
[Detaching after fork from child process 2491]

Thread 28 "python3.9" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff6bfff700 (LWP 1440)]
0x00007ffff711e387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007ffff711e387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff711fa78 in __GI_abort () at abort.c:90
#2  0x00007ffff4218a95 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff4216a06 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
#4  0x00007ffff4216a33 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00007ffff4216c53 in __cxxabiv1::__cxa_throw (obj=0x7fff64000940, tinfo=0x7ffff44a41f0 <typeinfo for std::runtime_error>, dest=
    0x7ffff422b1f0 <std::runtime_error::~runtime_error()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:87
#6  0x00007ffff4b9454a in cthulhu::Framework::validate() () from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#7  0x00007ffff4bea5d0 in cthulhu::StreamConsumerIPC::update() () from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#8  0x00007ffff4bea1f5 in cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}::operator()() const ()
   from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#9  0x00007ffff4bebb2a in void std::__invoke_impl<void, cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}>(std::__invoke_other, cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}&&) () from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#10 0x00007ffff4bebadf in std::__invoke_result<cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}>::type std::__invoke<cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}>(std::__invoke_result&&, (cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}&&)...) ()
   from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#11 0x00007ffff4beba8c in void std::thread::_Invoker<std::tuple<cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) ()
   from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#12 0x00007ffff4beba62 in std::thread::_Invoker<std::tuple<cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}> >::operator()() ()
   from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#13 0x00007ffff4beba46 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<cthulhu::StreamConsumerIPC::StreamConsumerIPC(cthulhu::StreamInterfaceIPC*, std::function<bool (cthulhu::StreamConfigIPC const&)> const&, std::function<bool (cthulhu::StreamSampleIPC const&)> const&, bool)::{lambda()#1}> > >::_M_run() ()
   from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#14 0x00007ffff4f42860 in execute_native_thread_routine () from /home/builder/.local/lib/python3.9/site-packages/cthulhubindings.cpython-39-x86_64-linux-gnu.so
#15 0x00007ffff7bc6ea5 in start_thread (arg=0x7fff6bfff700) at pthread_create.c:307
#16 0x00007ffff71e6b0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

It looks like the culprit is cthulhu::Framework::validate() which is throwing the SIGABRT. This can be traced back to: https://github.com/facebookresearch/labgraph/blob/6742cefe72e86c31ba835197808d4d7f397b40d9/Cthulhu/src/FrameworkIPCHybrid.cpp#L125-L130

log.txt

Expected behavior

Test suite should not SIGABRT.

Environment

jfResearchEng commented 2 years ago

https://github.com/facebookresearch/labgraph/blob/main/test_script.sh can be used for testing instead.

pperanich commented 2 years ago

Testing with that scripts works, but running the test suite all together shouldn't cause a memory bus fault as is the case described above. If this is expected behavior, can we document why that is?

jfResearchEng commented 2 years ago

Agree, it is possibly because the graph is not shut down completely for the pytest test_hang test case, I'll take a look at this.