WisconsinAIVision / yolact_edge

The first competitive instance segmentation approach that runs on small edge devices at real-time speeds.
MIT License
1.28k stars 273 forks source link

[TensorRT] ERROR: ../rtSafe/cublas/cublasWrapper.cpp (71) - cuBLAS Error in CublasWrapper: 1 (Could not initialize cublas. Please check CUDA installation.) #109

Closed victorhqx closed 3 years ago

victorhqx commented 3 years ago

run eval.py:

python eval.py --trained_model=../Downloads/yolact_edge_54_800000.pth --score_threshold=0.8 --top_k=100 --image=test_Color.jpg --use_tensorrt_safe_mode
Config not specified. Parsed yolact_edge_config from the file name.

[04/30 14:43:49 yolact.eval]: Loading model...
[04/30 14:43:52 yolact.eval]: Model loaded.
[04/30 14:43:52 yolact.eval]: Converting to TensorRT...
WARNING [04/30 14:43:52 yolact.eval]: Running TensorRT in safe mode. This is an attempt to solve various TensorRT engine errors.
[04/30 14:43:53 yolact.eval]: Converting backbone to TensorRT...
[TensorRT] ERROR: ../rtSafe/cublas/cublasWrapper.cpp (71) - cuBLAS Error in CublasWrapper: 1 (Could not initialize cublas. Please check CUDA installation.)
[TensorRT] ERROR: ../rtSafe/cublas/cublasWrapper.cpp (71) - cuBLAS Error in CublasWrapper: 1 (Could not initialize cublas. Please check CUDA installation.)
Traceback (most recent call last):
  File "eval.py", line 1275, in <module>
    convert_to_tensorrt(net, cfg, args, transform=BaseTransform())
  File "/home/hqx/yolact_edge/utils/tensorrt.py", line 156, in convert_to_tensorrt
    net.to_tensorrt_backbone(cfg.torch2trt_backbone_int8, calibration_dataset=calibration_dataset, batch_size=args.trt_batch_size)
  File "/home/hqx/yolact_edge/yolact.py", line 1501, in to_tensorrt_backbone
    self.trt_load_if("backbone", trt_fn, [x], int8_mode, batch_size=batch_size)
  File "/home/hqx/yolact_edge/yolact.py", line 1484, in trt_load_if
    self.save_trt_cached_module(module, module_name, int8_mode, batch_size=batch_size)
  File "/home/hqx/yolact_edge/yolact.py", line 1475, in save_trt_cached_module
    torch.save(module.state_dict(), module_path)
  File "/home/hqx/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1078, in state_dict
    hook_result = hook(self, destination, prefix, local_metadata)
  File "/home/hqx/miniconda3/lib/python3.8/site-packages/torch2trt-0.2.0-py3.8-linux-x86_64.egg/torch2trt/torch2trt.py", line 425, in _on_state_dict
    state_dict[prefix + "engine"] = bytearray(self.engine.serialize())
AttributeError: 'NoneType' object has no attribute 'serialize'

I can ensure TensorRT is installed correctly by running sample_mnist:

sudo ./sample_mnist [-h] [--datadir=/home/hqx/TensorRT-7.2.3.4/data/mnist/] [--useDLA=1] [--int8]
[sudo] password for hqx:
&&&& RUNNING TensorRT.sample_mnist # ./sample_mnist [-h] [--datadir=/home/hqx/TensorRT-7.2.3.4/data/mnist/] [--useDLA=1] [--int8]
[04/30/2021-14:48:52] [I] Building and running a GPU inference engine for MNIST
[04/30/2021-14:48:56] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[04/30/2021-14:49:00] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[04/30/2021-14:49:01] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/30/2021-14:49:01] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[04/30/2021-14:49:01] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[04/30/2021-14:49:01] [I] Input:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@#- -#@@@@@@@@
@@@@@@@@@@@@@@#     @@@@@@@@
@@@@@@@@@@@@@#.     #@@@@@@@
@@@@@@@@@@@@#.   :*  +@@@@@@
@@@@@@@@@@@-      *: -@@@@@@
@@@@@@@@@@#   :+ .%* -@@@@@@
@@@@@@@@@#   :@*+@@@  #@@@@@
@@@@@@@@%-  .*@@@@@@  -@@@@@
@@@@@@@@:  #@%@@@@@@  :@@@@@
@@@@@@@#  #@@@@@@@@@  :@@@@@
@@@@@@@: :@@@@@@@@@@  :@@@@@
@@@@@@*  +@@@@@@@@@@  =@@@@@
@@@@@@*  %@@@@@@@@@= :@@@@@@
@@@@@@* .@@@@@@@@@= .#@@@@@@
@@@@@@* =@@@@@@@#- -@@@@@@@@
@@@@@@* .@@@@@@+  -@@@@@@@@@
@@@@@@*  =#%*:. .-#@@@@@@@@@
@@@@@@*   ..   :=@@@@@@@@@@@
@@@@@@%:      =@@@@@@@@@@@@@
@@@@@@@%=   =%@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[04/30/2021-14:49:01] [I] Output:
0: **********
1:
2:
3:
4:
5:
6:
7:
8:
9:
&&&& PASSED TensorRT.sample_mnist # ./sample_mnist [-h] [--datadir=/home/hqx/TensorRT-7.2.3.4/data/mnist/] [--useDLA=1] [--int8]

Environment:

haotian-liu commented 3 years ago

By looking here, could it be the CUDA version which PyTorch is compiled with is different from that of TensorRT?

haotian-liu commented 3 years ago

Another possibility is that you have some failed TensorRT cache. Try removing all *.trt files in the same folder as your models.

haotian-liu commented 3 years ago

I am closing this issue for now due to inactivity. If there are any further questions that I could help with, please feel free to reopen, thanks.

AarenWu commented 2 years ago

I had the same problem when I loaded the tensorRT model file, Click here for details