Open ssteinbach opened 2 years ago
When OTIO is built in debug mode, we get a little bit more details:
Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: finalizing (tstate=0x000055649b90a060)
Aborted (core dumped)
The fact that it crashes it metadata={} is given as an argument kind of suggests me it has to do with the round-trip in https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/main/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp#L99-L121. This part of the code is pretty painful to debug...
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007ffff7e3d6b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007ffff7ded958 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff7dd753d in __GI_abort () at abort.c:79
#4 0x00007ffff7af561f in fatal_error_exit (status=-1) at Python/pylifecycle.c:2553
#5 fatal_error (fd=<optimized out>, header=header@entry=1, prefix=prefix@entry=0x7ffff7ce76d0 <__func__.6.lto_priv.2> "PyThreadState_Get",
msg=msg@entry=0x7ffff7c8e200 "the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)", status=status@entry=-1)
at Python/pylifecycle.c:2664
#6 0x00007ffff7af59fe in _Py_FatalErrorFunc (func=0x7ffff7ce76d0 <__func__.6.lto_priv.2> "PyThreadState_Get",
msg=0x7ffff7c8e200 "the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)") at Python/pylifecycle.c:2750
#7 0x00007ffff7af5a51 in _Py_FatalError_TstateNULL (func=<optimized out>) at Python/ceval.c:271
#8 0x00007ffff7a63c30 in _Py_EnsureFuncTstateNotNULL (tstate=0x0, func=0x7ffff7ce76d0 <__func__.6.lto_priv.2> "PyThreadState_Get") at ./Include/internal/pycore_pystate.h:95
#9 PyThreadState_Get () at Python/pystate.c:1015
#10 tupledealloc (op=0x7ffff5f1be60) at Objects/tupleobject.c:272
#11 0x00007ffff7b57c1c in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2300
#12 _Py_DECREF (op=<optimized out>) at ./Include/object.h:500
#13 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:567
#14 code_dealloc (co=0x7ffff5f3f9f0) at Objects/codeobject.c:658
#15 0x00007ffff7b53557 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2300
#16 _Py_DECREF (op=<optimized out>) at ./Include/object.h:500
#17 func_clear (op=0x7ffff5f1e710) at Objects/funcobject.c:626
#18 func_dealloc (op=0x7ffff5f1e710) at Objects/funcobject.c:648
#19 0x00007ffff581a6a5 in _Py_DECREF (op=0x7ffff5f1e710) at /usr/include/python3.10/object.h:500
#20 0x00007ffff581bbca in _Py_XDECREF (op=0x7ffff5f1e710) at /usr/include/python3.10/object.h:567
#21 0x00007ffff581cfcf in pybind11::handle::dec_ref() const & (this=0x7ffff5b0a130 <_value_to_any>)
at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:260
#22 0x00007ffff581d0b0 in pybind11::object::~object (this=0x7ffff5b0a130 <_value_to_any>, __in_chrg=<optimized out>)
at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:323
#23 0x00007ffff7deffa5 in __run_exit_handlers (status=status@entry=0, listp=0x7ffff7f8d760 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
at exit.c:113
#24 0x00007ffff7df0120 in __GI_exit (status=status@entry=0) at exit.c:143
#25 0x00007ffff7aa8699 in Py_Exit (sts=0) at Python/pylifecycle.c:2862
#26 0x00007ffff7a9f910 in handle_system_exit () at Python/pythonrun.c:775
#27 0x00007ffff7c1861c in _PyErr_PrintEx (set_sys_last_vars=1, tstate=0x55555555ed50) at Python/pythonrun.c:785
#28 PyErr_PrintEx (set_sys_last_vars=1) at Python/pythonrun.c:880
#29 0x00007ffff7aa10e7 in PyErr_Print () at Python/pythonrun.c:886
#30 _PyRun_SimpleFileObject (fp=<optimized out>, filename=<optimized out>, closeit=<optimized out>, flags=0x7fffffffe308) at Python/pythonrun.c:462
#31 0x00007ffff7aa1a6e in _PyRun_AnyFileObject (fp=0x55555555a470, filename=0x7ffff74424f0, closeit=1, flags=0x7fffffffe308) at Python/pythonrun.c:90
#32 0x00007ffff7c0ea6d in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff74424f0, program_name=0x7ffff74b4810) at Modules/main.c:353
#33 pymain_run_file (config=0x5555555855a0) at Modules/main.c:372
#34 pymain_run_python (exitcode=0x7fffffffe304) at Modules/main.c:587
#35 Py_RunMain () at Modules/main.c:666
#36 0x00007ffff7be3c9b in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:720
#37 0x00007ffff7dd8290 in __libc_start_call_main (main=main@entry=0x555555555120 <main>, argc=argc@entry=3, argv=argv@entry=0x7fffffffe538) at ../sysdeps/nptl/libc_start_call_main.h:58
#38 0x00007ffff7dd834a in __libc_start_main_impl (main=0x555555555120 <main>, argc=3, argv=0x7fffffffe538, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffe528) at ../csu/libc-start.c:381
#39 0x0000555555555045 in _start () at ../sysdeps/x86_64/start.S:115
Alright, with GDB atatched, we get this. The interesting part of that backtrace is:
#21 0x00007ffff581cfcf in pybind11::handle::dec_ref() const & (this=0x7ffff5b0a130 <_value_to_any>)
at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:260
#22 0x00007ffff581d0b0 in pybind11::object::~object (this=0x7ffff5b0a130 <_value_to_any>, __in_chrg=<optimized out>)
at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:323
Notice the _value_to_any
, which confirms my suspicion, or it's at least the region of code I highlighted in my previous reply. _value_to_any
is the python function that is called from the C+++ side to convert the python type given as the metadata argument into a PyANy (Python) and then returned which then converts the PyAny into a C++ PyAny (automatically by pybind11).
Some more information:
https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/main/src/py-opentimelineio/opentimelineio/core/_core_utils.py#L61 is executed properly, which means that at least opentimelineio.core._core_utils._value_to_any
was called and it returns.
Also found https://github.com/pytest-dev/pyfakefs/issues/693 which kind of means that pyfakefs can break stuff. Looking at the pyfakefs code, it seems extremely intrusive.
By doing a small change, we can verify if _value_to_any
actually returns something or it it crashes while converting the return value:
diff --git a/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp b/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp
index b3338eb..4f37429 100644
--- a/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp
+++ b/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp
@@ -188,7 +188,8 @@ static void define_bases1(py::module m) {
py::class_<SOWithMetadata, SerializableObject,
managing_ptr<SOWithMetadata>>(m, "SerializableObjectWithMetadata", py::dynamic_attr())
.def(py::init([](std::string name, py::object metadata) {
- return new SOWithMetadata(name, py_to_any_dictionary(metadata));
+ auto asd = py_to_any_dictionary(metadata);
+ return new SOWithMetadata(name, asd);
}),
py::arg_v("name"_a = std::string()),
py::arg_v("metadata"_a = py::none()))
diff --git a/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp b/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp
index c327374..55ed8ea 100644
--- a/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp
+++ b/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp
@@ -102,7 +102,8 @@ static void py_to_any(py::object const& o, any* result) {
_value_to_any = core.attr("_value_to_any");
}
- result->swap(_value_to_any(o).cast<PyAny*>()->a);
+ py::object tmp = _value_to_any(o);
+ result->swap(tmp.cast<PyAny*>()->a);
}
AnyDictionary py_to_any_dictionary(py::object const& o) {
Then in GDB:
(gdb) b otio_utils.cpp:105
(gdb) b otio_utils.cpp:106
(gdb) b otio_utils.cpp:121
(gdb) b otio_serializableObjects.cpp:192
(gdb) run .venv/bin/pytest tests/test_asd.py
Starting program: /home/jcmorin/jcmenv/aswf/OpenTimelineIO/.venv/bin/python .venv/bin/pytest tests/test_asd.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
================================================================================== test session starts ==================================================================================
platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/jcmorin/jcmenv/aswf/OpenTimelineIO
plugins: cov-4.0.0, pyfakefs-5.0.0
collected 2 items
tests/test_asd.py .
Breakpoint 1, py_to_any (o=..., result=0x7fffffff9400) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp:105
105 py::object tmp = _value_to_any(o);
(gdb) l
100 if (_value_to_any.is_none()) {
101 py::object core = py::module::import("opentimelineio.core");
102 _value_to_any = core.attr("_value_to_any");
103 }
104
105 py::object tmp = _value_to_any(o);
106 result->swap(tmp.cast<PyAny*>()->a);
107 }
108
109 AnyDictionary py_to_any_dictionary(py::object const& o) {
(gdb) c
Continuing.
Breakpoint 2, py_to_any (o=..., result=0x7fffffff9400) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp:106
106 result->swap(tmp.cast<PyAny*>()->a);
(gdb) l
101 py::object core = py::module::import("opentimelineio.core");
102 _value_to_any = core.attr("_value_to_any");
103 }
104
105 py::object tmp = _value_to_any(o);
106 result->swap(tmp.cast<PyAny*>()->a);
107 }
108
109 AnyDictionary py_to_any_dictionary(py::object const& o) {
110 if (o.is_none()) {
(gdb) c
Continuing.
Breakpoint 3, py_to_any_dictionary (o=...) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp:121
121 return safely_cast_any_dictionary_any(a);
(gdb) l
116 if (!compare_typeids(a.type(), typeid(AnyDictionary))) {
117 throw py::type_error(string_printf("Expected an AnyDictionary (i.e. metadata); got %s instead",
118 type_name_for_error_message(a).c_str()));
119 }
120
121 return safely_cast_any_dictionary_any(a);
122 }
123
124 py::object any_to_py(any const& a, bool top_level) {
125 std::type_info const& tInfo = a.type();
(gdb) c
Continuing.
Breakpoint 4, operator() (__closure=0x5555559f63d8, name="", metadata=...) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp:192
192 return new SOWithMetadata(name, asd);
(gdb) l
187
188 py::class_<SOWithMetadata, SerializableObject,
189 managing_ptr<SOWithMetadata>>(m, "SerializableObjectWithMetadata", py::dynamic_attr())
190 .def(py::init([](std::string name, py::object metadata) {
191 auto asd = py_to_any_dictionary(metadata);
192 return new SOWithMetadata(name, asd);
193 }),
194 py::arg_v("name"_a = std::string()),
195 py::arg_v("metadata"_a = py::none()))
196 .def_property_readonly("metadata", [](SOWithMetadata* s) {
So _value_to_any
returns a py::object
as expected and the cast works, then safely_cast_any_dictionary_any
is called and seems to be working. Then the breakpoint at line 192 is hit. So it either crashes while creating SOWithMetadata
or hile returning SOWithMetadata
. Or something like that.
Stumbled over this issue today, and created a respective issue in pyfakefs. I cannot promise a fix so far (have to understand the problem first), but it seems to have something to with the caching of patched modules. Clearing the fs cache before running the problematic code can serve as an (ugly) workaround at the moment:
def test_create_clip(capsys, fs):
fs.clear_cache()
otio.core.SerializableObjectWithMetadata(
metadata={}
)
Thanks! We will test this workaround to see if it helps.
Alternatively, you could use a fixture without caching instead of fs
, e.g.:
import pytest
from pyfakefs.fake_filesystem_unittest import Patcher
@pytest.fixture
def fs_no_cache():
with Patcher(use_cache=False) as patcher:
yield patcher.fs
...
def test_empty_fs(fs_no_cache):
pass
Caching is a performance feature that is only relevant if you run a larger number of tests with fs
.
This is the first time I see a side-effect from caching, but I had added the possibility to switch it off at the time just in case...
Ok, I've made a patch release, please check if your tests work with the current version!
Nice, thanks @mrbean-bremen !
A user reported that there is a segmentation fault when running pytest + objects with metadata + the "fs" (fake filesystem) capture system.
Repro:
Output:
(The warning is related to our use of
imp
and appears to be a red herring)