Closed Akshaysharma29 closed 4 years ago
There is nothing when open this url, I have the same issue,Could you share the way to solve this issue?
Hi @weixiaolian21, I have not completely solved this issue. There is some threading issue of worker_thread. which can be solved using callback function as used in this link https://stackoverflow.com/questions/61223028/flask-app-is-keep-on-loading-at-the-time-of-predictiontensorrt
but then new issue occur. if you are able to solve this then share the approach. Thanks.
Facing same issue, found any solution?
@Akshaysharma29 @weixiaolian21 @nik13 , were you able to solve this issue?
@Akshaysharma29 @weixiaolian21 @nik13 , were you able to solve this issue?
I think this issue could be resolved by wrapping the TensorRT inference function (i.e. execute_async or execute_async_v2) with pushing/poping of the default CUDA context.
Reference: https://github.com/jkjung-avt/tensorrt_demos/issues/213#issuecomment-691868942
Thank you. I have one doubt, won't the pushing and popping context for every inference increased inference time? Also, can this be used for scaledyolov4?
I have one doubt, won't the pushing and popping context for every inference increased inference time?
Based on my tests on Jetson nano, the overhead of cuda contect pushing/poping is negligible.
Also, can this be used for scaledyolov4?
My TensorRT YOLOv4 implementation does support Scaled-YOLOv4. More specifically, the code supports darknet "yolov4-csp" and "yolov4x-mish" models out of the box.
Hi,
I have seen the above solution, we should first push the context that we defined at the beginning of the program, and pop it after we run inference with context created by the engine.
I have another question about this: what if I have multi-gpu and I need to run inference in parallel on these gpus ? How could I configure my program please ?
I mean:
## The following four inference are executed in parallel on 4 gpus
out1 = trt_infer_on_gpu1(inp)
out2 = trt_infer_on_gpu2(inp)
out3 = trt_infer_on_gpu3(inp)
out4 = trt_infer_on_gpu4(inp)
Description
I have an inference code in TensorRT(with python). I want to run this code in Flask but I get the below error when trying to allocate buffer: Debugging middleware caught exception in streamed response at a point where response headers were already sent. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/werkzeug/wsgi.py", line 506, in next return self._next() File "/usr/local/lib/python3.6/dist-packages/werkzeug/wrappers/base_response.py", line 45, in _iter_encoded for item in iterable: File "/home/jetson-alpha/Desktop/video_streamming_tensorRt/video.py", line 89, in gen inputs, outputs, bindings, stream = allocate_buffers(engine) # input, output: host # bindings File "/home/jetson-alpha/Desktop/video_streamming_tensorRt/config.py", line 23, in allocate_buffers stream = cuda.Stream() pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?
The code works well on jupyter notebook.
Environment
TensorRT Version: 6.0.1.10 device Type: jetson nano CUDA Version: 10 Python Version (if applicable): 3.6 PyTorch Version (if applicable): 1.1.0
Relevant Files
Steps To Reproduce