[TensorRT] ERROR: INVALID_ARGUMENT: Cannot deserialize with an empty memory buffer

santhoshnumberone commented 4 years ago

Description

I am trying to convert YOLO V3 into tensorRT

I am using tensorRT docker image docker pull nvcr.io/nvidia/tensorrt:19.12-py2

ran $ sudo docker run -it --rm -v $(pwd):/workspace --runtime=nvidia -w /workspace -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY nvcr.io/nvidia/tensorrt:19.12-py2

then ran

/opt/tensorrt/python/python_setup.sh and /opt/tensorrt/install_opensource.sh

Went inside Then downloaded the data wget http://images.cocodataset.org/zips/test2017.zip

unzip test2017.zip

$ /workspace# cd YOLOv3-Darknet-ONNX-TensorRT YoloV3 to ONNX $ /workspace/YOLOv3-Darknet-ONNX-TensorRT# python yolov3_to_onnx.py when I ran ONNX to TENSORRT $ /workspace/YOLOv3-Darknet-ONNX-TensorRT# python onnx_to_tensorrt.py I get this

[TensorRT] ERROR: INVALID_ARGUMENT: Cannot deserialize with an empty memory buffer. [TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed. Traceback (most recent call last): File "onnx_to_tensorrt.py", line 192, in main() File "onnx_to_tensorrt.py", line 131, in main with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context: AttributeError: /exi/t

Can anyone help me out here?

Environment

GPU Type: GTX 1050Ti Nvidia Driver Version: 440.33.01 CUDA Version: 10.2 Python Version (if applicable): Python 2 Baremetal or Container (if container which image + tag): docker pull nvcr.io/nvidia/tensorrt:19.12-py2

Tried it on the local system as well but same issue Yolo V3 to TensorRT on Ubuntu 18.04 with GeForce GTX 1050Ti

rmccorm4 commented 4 years ago

Hi,

Can you try to run this sample using our NGC container? I wasn't able to reproduce your issue, the sample worked for me - but this was on a V100 GPU.

Commands:

nvidia-docker run -it -v ${PWD}:/mnt nvcr.io/nvidia/tensorrt:19.12-py2
/opt/tensorrt/python/python_setup.sh
cd /opt/tensorrt/samples/python/yolov3_onnx/
python yolov3_to_onnx.py
python onnx_to_tensorrt.py

root@a706cc944e40:/workspace/tensorrt/samples/python/yolov3_onnx# python onnx_to_tensorrt.py
Downloading from https://github.com/pjreddie/darknet/raw/f86901f6177dfc6116360a13cc06ab680e0c86b0/data/dog.jpg, this may take a while...
100% [............................................................................] 163759 / 163759
Loading ONNX file from path yolov3.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine from file yolov3.onnx; this may take a while...
Completed creating Engine
Running inference on image dog.jpg...
[[135.14841098 219.59878846 184.30208646 324.0265199 ]
 [ 98.30807283 135.72612824 499.71261624 299.25580544]
 [478.00606086  81.25701542 210.57787267  86.91503773]] [0.99854713 0.99880403 0.93829264] [16  1  7]
Saved image with bounding boxes of detected objects to dog_bboxes.png.

santhoshnumberone commented 4 years ago

Hi,

Can you try to run this sample using our NGC container? I wasn't able to reproduce your issue, the sample worked for me - but this was on a V100 GPU.

Commands:

nvidia-docker run -it -v ${PWD}:/mnt nvcr.io/nvidia/tensorrt:19.12-py2
/opt/tensorrt/python/python_setup.sh
cd /opt/tensorrt/samples/python/yolov3_onnx/
python yolov3_to_onnx.py
python onnx_to_tensorrt.py

root@a706cc944e40:/workspace/tensorrt/samples/python/yolov3_onnx# python onnx_to_tensorrt.py
Downloading from https://github.com/pjreddie/darknet/raw/f86901f6177dfc6116360a13cc06ab680e0c86b0/data/dog.jpg, this may take a while...
100% [............................................................................] 163759 / 163759
Loading ONNX file from path yolov3.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine from file yolov3.onnx; this may take a while...
Completed creating Engine
Running inference on image dog.jpg...
[[135.14841098 219.59878846 184.30208646 324.0265199 ]
 [ 98.30807283 135.72612824 499.71261624 299.25580544]
 [478.00606086  81.25701542 210.57787267  86.91503773]] [0.99854713 0.99880403 0.93829264] [16  1  7]
Saved image with bounding boxes of detected objects to dog_bboxes.png.

I have tried to run it using the container on the local system as well as installing all the requirements for TensorRT on the local system, both apparently end up with the same error.

Could you please try it out on a local system with GTX 1050Ti?

rmccorm4 commented 4 years ago

Hi @santhoshnumberone,

I don't have a GTX 1050 Ti to test on, but I do have a Tesla P4 which has the same compute capability (6.1) :

root@bdd5c8aff281:/opt/tensorrt/samples/python/yolov3_onnx# nvidia-smi
Tue Dec 31 17:14:08 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            On   | 00000000:17:00.0 Off |                    0 |
| N/A   64C    P8     9W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Using same container + commands:

nvidia-docker run -it -v ${PWD}:/mnt nvcr.io/nvidia/tensorrt:19.12-py2
/opt/tensorrt/python/python_setup.sh
cd /opt/tensorrt/samples/python/yolov3_onnx/
python yolov3_to_onnx.py
python onnx_to_tensorrt.py

and it still works with no error:

...
Completed creating Engine
Running inference on image dog.jpg...
[[135.14838776 219.59885956 184.30211255 324.02638471]
 [ 98.30807389 135.72610674 499.71262555 299.25582216]
 [478.00608951  81.25702328 210.57782246  86.91503659]] [0.99854713 0.99880403 0.93829246] [16  1  7]
Saved image with bounding boxes of detected objects to dog_bboxes.png.

rmccorm4 commented 4 years ago

I notice you're doing some things to get graphics from the container:

-v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY

But if these are using the GPU for graphics, I'm not sure if that would cause any potential issues.

Have you tried using my exact commands above?

nvidia-docker run -it -v ${PWD}:/mnt nvcr.io/nvidia/tensorrt:19.12-py2

jkjung-avt commented 4 years ago

Try enable verbose debug log in 'onnx_to_tensorrt.py'. It would be much easier to identify the problem by looking at the logs.

TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

Or you could reference my code here: https://github.com/jkjung-avt/tensorrt_demos/blob/master/yolov3_onnx/onnx_to_tensorrt.py#L59

rmccorm4 commented 4 years ago

Hi @santhoshnumberone,

I cloned your custom code and reproduced your error and then fixed it, I think it was just a user error with the custom code.

The first time I ran your code, it failed to create the engine because there was no marked output:

Building an engine from file ./engine/yolov3.onnx; this may take a while...
[TensorRT] ERROR: Network must have at least one output
Completed creating Engine

Which created an empty engine file (notice size is 0 for yolov3.trt):

root@bdd5c8aff281:/workspace/tmp/YOLOv3-Darknet-ONNX-TensorRT/engine# ls -lh
total 237M
-rw-r--r-- 1 root root 237M Dec 31 18:22 yolov3.onnx
-rw-r--r-- 1 root root    0 Dec 31 18:20 yolov3.trt

You can fix the output error with something like this:

with open(onnx_file_path, 'rb') as model:
    print('Beginning ONNX file parsing')
    parser.parse(model.read())
    print('Completed parsing of ONNX file')
# Add this line
network.mark_output(network.get_layer(network.num_layers-1).get_output(0))

But since the file already existed from the last failed run, upon running it again your code read the empty engine file:

# File exists but it's empty, so this block raises the empty memory buffer error
if os.path.exists(engine_file_path):
    # If a serialized engine exists, use it instead of building an engine.
    with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

Which resulted in your error:

[TensorRT] ERROR: INVALID_ARGUMENT: Cannot deserialize with an empty memory buffer.
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

If you delete that empty file rm engine/yolov3.trt (or remove the checks for file existence in your code) and run python onnx_to_tensorrt.py again it seems to work:

$ rm engine/yolov3.trt
$ python onnx_to_tensorrt.py
...
Loading ONNX file from path ./engine/yolov3.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Marking output...
Building an engine from file ./engine/yolov3.onnx; this may take a while...
[TensorRT] WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[TensorRT] WARNING: No implementation obeys reformatting-free rules, at least 2 reformatting nodes are needed, now picking the fastest path instead.
Completed creating Engine
0, Image 000000250613.jpg, Recognition Time 0.024 seconds
1, Image 000000389785.jpg, Recognition Time 0.024 seconds
2, Image 000000122223.jpg, Recognition Time 0.028 seconds
3, Image 000000498452.jpg, Recognition Time 0.028 seconds
4, Image 000000361133.jpg, Recognition Time 0.028 seconds
5, Image 000000023081.jpg, Recognition Time 0.028 seconds
...

santhoshnumberone commented 4 years ago

@rmccorm4 Cause for this seems to be GPU is out of memory on GTX 1050Ti https://github.com/NVIDIA/TensorRT/issues/319

https://devtalk.nvidia.com/default/topic/1044046/tensorrt/-tensorrt-error-network-must-have-at-least-one-output/

Is there a work around?

santhoshnumberone commented 4 years ago

Hi @santhoshnumberone,

I cloned your custom code and reproduced your error and then fixed it, I think it was just a user error with the custom code.

The first time I ran your code, it failed to create the engine because there was no marked output:

Building an engine from file ./engine/yolov3.onnx; this may take a while...
[TensorRT] ERROR: Network must have at least one output
Completed creating Engine

Which created an empty engine file (notice size is 0 for yolov3.trt):

root@bdd5c8aff281:/workspace/tmp/YOLOv3-Darknet-ONNX-TensorRT/engine# ls -lh
total 237M
-rw-r--r-- 1 root root 237M Dec 31 18:22 yolov3.onnx
-rw-r--r-- 1 root root    0 Dec 31 18:20 yolov3.trt

You can fix the output error with something like this:

with open(onnx_file_path, 'rb') as model:
    print('Beginning ONNX file parsing')
    parser.parse(model.read())
    print('Completed parsing of ONNX file')
# Add this line
network.mark_output(network.get_layer(network.num_layers-1).get_output(0))

But since the file already existed from the last failed run, upon running it again your code read the empty engine file:

# File exists but it's empty, so this block raises the empty memory buffer error
if os.path.exists(engine_file_path):
    # If a serialized engine exists, use it instead of building an engine.
    with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

Which resulted in your error:

[TensorRT] ERROR: INVALID_ARGUMENT: Cannot deserialize with an empty memory buffer.
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

If you delete that empty file rm engine/yolov3.trt (or remove the checks for file existence in your code) and run python onnx_to_tensorrt.py again it seems to work:

$ rm engine/yolov3.trt
$ python onnx_to_tensorrt.py
...
Loading ONNX file from path ./engine/yolov3.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Marking output...
Building an engine from file ./engine/yolov3.onnx; this may take a while...
[TensorRT] WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[TensorRT] WARNING: No implementation obeys reformatting-free rules, at least 2 reformatting nodes are needed, now picking the fastest path instead.
Completed creating Engine
0, Image 000000250613.jpg, Recognition Time 0.024 seconds
1, Image 000000389785.jpg, Recognition Time 0.024 seconds
2, Image 000000122223.jpg, Recognition Time 0.028 seconds
3, Image 000000498452.jpg, Recognition Time 0.028 seconds
4, Image 000000361133.jpg, Recognition Time 0.028 seconds
5, Image 000000023081.jpg, Recognition Time 0.028 seconds
...

Your suggestion of deleting the .trt also gave the same error

[TensorRT] VERBOSE: Formats and tactics selection completed in 262.569 seconds.
[TensorRT] VERBOSE: After reformat layers: 180 layers
[TensorRT] VERBOSE: Block size 1417674752
[TensorRT] VERBOSE: Block size 1417674752
[TensorRT] VERBOSE: Block size 708837376
[TensorRT] VERBOSE: Block size 354418688
[TensorRT] VERBOSE: Block size 268435456
[TensorRT] VERBOSE: Block size 44302336
[TensorRT] VERBOSE: Block size 44302336
[TensorRT] VERBOSE: Total Activation Memory: 4255645696
[TensorRT] INFO: Detected 1 inputs and 3 output network tensors.
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (GPU memory allocation failed during allocation of workspace. Try decreasing batch size.)
Completed creating Engine
Traceback (most recent call last):
  File "onnx_to_tensorrt.py", line 112, in <module>
    main()
  File "onnx_to_tensorrt.py", line 108, in main
    _ = build_engine(onnx_file_path, engine_file_path)
  File "onnx_to_tensorrt.py", line 94, in build_engine
    f.write(engine.serialize())
AttributeError: 'NoneType' object has no attribute 'serialize'

anhlt18vn commented 4 years ago

Hi @santhoshnumberone : Have you solved this issue. I have the same problems.

jkjung-avt commented 4 years ago

My jkjung-avt/tensorrt_demos repo could support most of the yolov3 and yolov4 models. Feel free to give it a try. It works not only on the Jetson platforms, but also x86_64 PCs.

If you still run into "out of memory (OOM)" problem, maybe you should consider reduce input image size from 608x608 to, say, 416x416.

FengZhiheng commented 1 year ago

I have the same issue at Linux env with C++

Error Code 3: Internal Error (Cannot deserialize with an empty memory buffer.)

NVIDIA / TensorRT

[TensorRT] ERROR: INVALID_ARGUMENT: Cannot deserialize with an empty memory buffer #302

Description

Environment