RidgeRun / gst-inference

A GStreamer Deep Learning Inference Framework
GNU Lesser General Public License v2.1
122 stars 29 forks source link

TS_ConfigProto sporadic segfault #280

Open michaelgruner opened 4 years ago

michaelgruner commented 4 years ago

I see sporadic segfaults when starting the following pipeline:

inferencefilter filter-class=-1 ! inferencedebug name=before ! videoconvert ! tee name=tee tee. ! queue max-size-buffers=3 ! arch.sink_bypass tee. ! queue max-size-buffers=3 ! inferencecrop enable=false ! videoscale ! arch.sink_model tinyyolov2 name=arch model-location=graph_tinyyolov2_tensorflow.pb arch.src_bypass ! queue ! inferencedebug name=after ! inferenceoverlay

The backtrace is always:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xffffffffaf5c9ca0)
  * frame #0: 0x000000010eb5aad6 libtensorflow_framework.so`tensorflow::ConfigProto::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*) + 118
    frame #1: 0x000000010eab9113 libtensorflow_framework.so`google::protobuf::MessageLite::ParseFromArray(void const*, int) + 147
    frame #2: 0x00000001038d8b35 libtensorflow.so`TF_SetConfig + 37
    frame #3: 0x00000001024756af libr2inference-0.0.0.dylib`r2i::tensorflow::Engine::Start(this=0x000000010f5c9c70) at engine.cc:110:3
    frame #4: 0x0000000102411c7b libgstinference-1.0.0.dylib`::gst_backend_start(self=0x0000000101076e50, model_location="graph_tinyyolov2_tensorflow.pb", err=0x00007ffeefbfed68) at gstbackend.cc:369:25
    frame #5: 0x0000000102405beb libgstinference-1.0.0.dylib`gst_video_inference_start(self=0x00000001008b40c0) at gstvideoinference.c:432:8
    frame #6: 0x000000010240520f libgstinference-1.0.0.dylib`gst_video_inference_change_state(element=0x00000001008b40c0, transition=GST_STATE_CHANGE_READY_TO_PAUSED) at gstvideoinference.c:484:20
    frame #7: 0x0000000100101ffd libgstreamer-1.0.0.dylib`gst_element_change_state + 192
    frame #8: 0x00000001001035fd libgstreamer-1.0.0.dylib`gst_element_set_state_func + 384
    frame #9: 0x00000001000ddf46 libgstreamer-1.0.0.dylib`gst_bin_change_state_func + 1161
    frame #10: 0x0000000100101ffd libgstreamer-1.0.0.dylib`gst_element_change_state + 192
    frame #11: 0x00000001001021ef libgstreamer-1.0.0.dylib`gst_element_change_state + 690
    frame #12: 0x00000001001035fd libgstreamer-1.0.0.dylib`gst_element_set_state_func + 384
    frame #13: 0x00000001000ddf46 libgstreamer-1.0.0.dylib`gst_bin_change_state_func + 1161
    frame #14: 0x00000001004eab2c libgstinferenceutils.so`gst_inference_bin_change_state(element=0x0000000100834050, transition=GST_STATE_CHANGE_READY_TO_PAUSED) at gstinferencebin.c:443:7
    frame #15: 0x0000000100101ffd libgstreamer-1.0.0.dylib`gst_element_change_state + 192
    frame #16: 0x00000001001035fd libgstreamer-1.0.0.dylib`gst_element_set_state_func + 384
    frame #17: 0x00000001000ddf46 libgstreamer-1.0.0.dylib`gst_bin_change_state_func + 1161
    frame #18: 0x0000000100101ffd libgstreamer-1.0.0.dylib`gst_element_change_state + 192
    frame #19: 0x00000001001021ef libgstreamer-1.0.0.dylib`gst_element_change_state + 690
    frame #20: 0x00000001001035fd libgstreamer-1.0.0.dylib`gst_element_set_state_func + 384
    frame #21: 0x0000000100002bc1 gst-launch-1.0`main + 1421
    frame #22: 0x00007fff7bc353d5 libdyld.dylib`start + 1
    frame #23: 0x00007fff7bc353d5 libdyld.dylib`start + 1
michaelgruner commented 4 years ago

It looks like if the session_memory_usage_index was never initialized it contains garbage:

(lldb) frame select 3
frame #3: 0x00000001024756af libr2inference-0.0.0.dylib`r2i::tensorflow::Engine::Start(this=0x00000001026cbc30) at engine.cc:110:3
   107    TF_Graph *graph = pgraph.get();
   108    TF_Status *status = pstatus.get ();
   109    TF_SessionOptions *opt = popt.get ();
-> 110    TF_SetConfig(opt, this->config[this->session_memory_usage_index],
   111                 RAM_ARRAY_SIZE, status);
   112
   113    std::shared_ptr<TF_Session> session (TF_NewSession(graph, opt, status),
(lldb) print this->session_memory_usage_index
(int) $0 = -536870912
michaelgruner commented 4 years ago

This is being fixed by the following PR: https://github.com/RidgeRun/r2inference/pull/65