Hivemapper / odc-api

Software and APIs used to run the Open Dashcam (ODC) devices that collect data on the Hivemapper Mapping Network
13 stars 6 forks source link

Restart ML service when "RequestInference failed" or "DECODER_CREATE output bad rc: 0" happens #153

Closed punov closed 9 months ago

punov commented 10 months ago

Inference stuck with the following errors:

Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 1080, in openvino.inference_engine.ie_api.ExecutableNetwork.infer
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 1431, in openvino.inference_engine.ie_api.InferRequest.infer
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 1453, in openvino.inference_engine.ie_api.InferRequest.infer
Jan 10 22:16:58 keembay object-detection.sh[454]: RuntimeError: VpualCoreNNExecutor::push: RequestInference failed8
Jan 10 22:16:58 keembay object-detection.sh[454]: During handling of the above exception, another exception occurred:
Jan 10 22:16:58 keembay object-detection.sh[454]: Traceback (most recent call last):
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Jan 10 22:16:58 keembay object-detection.sh[454]: Exception in thread Thread-2:
Jan 10 22:16:58 keembay object-detection.sh[454]: Traceback (most recent call last):
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/opt/object-detection/detect.py", line 146, in worker
Jan 10 22:16:58 keembay object-detection.sh[454]:     self.run()
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/usr/lib/python3.8/threading.py", line 870, in run
Jan 10 22:16:58 keembay object-detection.sh[454]:     self._target(*self._args, **self._kwargs)
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/opt/object-detection/detect.py", line 152, in worker
Jan 10 22:16:58 keembay object-detection.sh[454]:     session_sm = ie.import_network(model_file=model_path, device_name=device)
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 463, in openvino.inference_engine.ie_api.IECore.import_network
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 494, in openvino.inference_engine.ie_api.IECore.import_network
Jan 10 22:16:58 keembay object-detection.sh[454]: RuntimeError: CallVpu for DECODER_CREATE output bad rc: 0
Jan 10 22:16:58 keembay object-detection.sh[454]:     detections, metrics = detect(os.path.join(image[1], image[0]), session_sm, model_shape, input_blob, con>
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/opt/object-detection/detect.py", line 43, in detect
Jan 10 22:16:58 keembay object-detection.sh[454]:     output = session.infer(inputs={input_blob: tensor})
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 1080, in openvino.inference_engine.ie_api.ExecutableNetwork.infer
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 1431, in openvino.inference_engine.ie_api.InferRequest.infer
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 1453, in openvino.inference_engine.ie_api.InferRequest.infer
Jan 10 22:16:58 keembay object-detection.sh[454]: RuntimeError: VpualCoreNNExecutor::push: RequestInference failed8
Jan 10 22:16:58 keembay object-detection.sh[454]: During handling of the above exception, another exception occurred:
Jan 10 22:16:58 keembay object-detection.sh[454]: Traceback (most recent call last):
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Jan 10 22:16:58 keembay object-detection.sh[454]:     self.run()
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/usr/lib/python3.8/threading.py", line 870, in run
Jan 10 22:16:58 keembay object-detection.sh[454]: NnXlinkPlg: Close channel failed: 3
Jan 10 22:16:58 keembay object-detection.sh[454]: NnXlinkPlg: Close channel failed: 3
Jan 10 22:16:58 keembay object-detection.sh[454]:     self._target(*self._args, **self._kwargs)
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "/opt/object-detection/detect.py", line 152, in worker
Jan 10 22:16:58 keembay object-detection.sh[454]:     session_sm = ie.import_network(model_file=model_path, device_name=device)
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 463, in openvino.inference_engine.ie_api.IECore.import_network
Jan 10 22:16:58 keembay object-detection.sh[454]:   File "ie_api.pyx", line 494, in openvino.inference_engine.ie_api.IECore.import_network
Jan 10 22:16:58 keembay object-detection.sh[454]: RuntimeError: CallVpu for DECODER_CREATE output bad rc: 0
Jan 10 22:16:58 keembay object-detection.sh[454]: NnXlinkPlg: Close channel failed: 3
Jan 10 22:16:58 keembay object-detection.sh[454]: NnXlinkPlg: Close channel failed: 3

Should properly throw and restart.

jayalberts commented 10 months ago

testing yesterday. gracefully repair if crash.

Masaya-RT commented 10 months ago

Validated with 4.0.4