Closed moonnyeon closed 1 year ago
i command this line : CUDA_VISIBLE_DEVICES=0 python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_fp16 --use_gpu=True --run_benchmark=True
ebable_tensorrt:False
-> Does this mean I am not using TensorRT? Or, is it a sudden thought that when implementing a model in PaddlePaddle, I don't need to do anything more because it's already optimized?
what is your PaddlePaddle version
? and make sure setting TensorRT
properly
my PaddlePaddle version is
pip list : tensorrt 8.4.0.6
my ~/.bashrc:
Comment out this line, to check whether app run using trt engine https://github.com/PaddlePaddle/PaddleDetection/blob/6f384cb3043938943aa4ae51931e7d924f380f42/deploy/python/infer.py#L875
Thank you very much for your continuous responses. However, I am now running mot_jde_infer.py instead of infer.py to try FairMOT. Is there a way to check if tensorRT is running in mot_jde_infer.py, similar to what was answered earlier?
oh ! this line comment out : none
I only ran mot_jde_infer.py using paddlepaddle-gpu2.4.1. However, when I tried to run infer.py, I got an error. So I installed paddlepaddle (cpu version 2.4.1) and after that the error disappeared.
So I downloaded and ran PaddlePaddle (CPU version) and the error did not occur. Instead, it seems like I can't use GPU.
Is it necessary to have both PaddlePaddle (CPU) and PaddlePaddle-GPU installed at the same time? Is the order of installation of PaddlePaddle (CPU) first and then PaddlePaddle-GPU later important?
I have tried many things since my last question and I have learned a lot about TensorRT through my efforts. Finally, I ran the following command line.
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_benchmark=True --run_mode=trt_fp32
I0203 18:37:48.899986 17980 tensorrt_subgraph_pass.cc:244] --- detect a sub-graph with 190 nodes I0203 18:37:48.908310 17980 tensorrt_subgraph_pass.cc:560] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0203 18:37:49.599192 17980 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0203 18:37:49.877519 17980 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0203 18:37:49.877683 17980 helper.h:110] Detected invalid timing cache, setup a local cache instead E0203 18:38:05.247386 17980 helper.h:114] 4: [pluginV2Builder.cpp::makeRunner::680] Error Code 4: Internal Error (Internal error: plugin node deformable_conv (Output: deformable_conv_6.tmp_0804) requires 47628288 bytes of scratch space, but only 33554432 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize(). ) E0203 18:38:05.247411 17980 helper.h:114] 2: [builder.cpp::buildSerializedNetwork::399] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
0 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&)
1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete
FatalError: Segmentation fault
is detected by the operating system.
[TimeInfo: Aborted at 1675417085 (unix time) try "date -d @1675417085" if you are using GNU date ]
[SignalInfo: SIGSEGV (@0x8) received by PID 17980 (TID 0x7fbb927a46c0) from PID 8 ]
Segmentation fault (core dumped)
now what can i do ,, ?
The message indicates paddle try to prepare trt engine, but
TensorRT was linked against xxxx
, but it maybe not be fatal[pluginV2Builder.cpp::makeRunner::680] Error Code 4: Internal Error (Internal error: plugin node deformable_conv....
, for this internal error, I think u should increase the workspace size by modifing this lineI0203 18:37:48.908310 17980 tensorrt_subgraph_pass.cc:560] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W0203 18:37:49.599192 17980 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1
W0203 18:37:49.877519 17980 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0
W0203 18:37:49.877683 17980 helper.h:110] Detected invalid timing cache, setup a local cache instead
E0203 18:38:05.247386 17980 helper.h:114] 4: [pluginV2Builder.cpp::makeRunner::680] Error Code 4: Internal Error (Internal error: plugin node deformable_conv (Output: deformable_conv_6.tmp_0804) requires 47628288 bytes of scratch space, but only 33554432 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
"Thank you for your advice, I have modified the line as you suggested, like in the picture below."
"By doing that, the memory shortage error that existed before has been resolved. Thank you so much." "But the error I have never seen before has appeared again."
"I executed the following command line." : CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_benchmark=True --run_mode=trt_int8 --trt_calib_mode=True
output of the command line :
Found 1 inference images in total. W0206 08:56:38.291393 733556 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.2 W0206 08:56:38.292671 733556 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1. I0206 08:56:38.322510 733556 tensorrt_engine_op.h:421] This process is generating calibration table for Paddle TRT int8... I0206 08:56:38.324936 733598 tensorrt_engine_op.h:301] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0206 08:56:38.998950 733598 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0206 08:56:39.277165 733598 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0206 08:56:40.369802 733598 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0206 08:56:40.370350 733598 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0206 08:56:40.371969 733598 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0206 08:56:40.372790 733598 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0206 08:56:40.374797 733556 gpu_resources.cc:217] WARNING: device: . The installed Paddle is compiled with CUDNN 8.2, but CUDNN version in your machine is 8.1, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. I0206 08:56:40.411801 733608 tensorrt_engine_op.h:301] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0206 08:56:40.419792 733608 helper.h:110] Unused Input: relu_44.tmp_0_clone_0 E0206 08:56:40.419865 733608 helper.h:114] 4: Output tensor tmp_01159 of type Float produced from output of incompatible type Bool E0206 08:56:40.419961 733608 helper.h:114] 4: [network.cpp::validate::2534] Error Code 4: Internal Error (Could not compute dimensions for tmp_01159, because the network is not valid.) E0206 08:56:40.419971 733608 helper.h:114] 2: [builder.cpp::buildSerializedNetwork::399] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
0 paddle::operators::TensorRTEngineOp::RunCalibration(paddle::framework::Scope const&, phi::Place const&) const::{lambda()#1}::operator()() const
1 paddle::operators::TensorRTEngineOp::PrepareTRTEngine(paddle::framework::Scope const&, paddle::inference::tensorrt::TensorRTEngine) const
2 paddle::inference::tensorrt::OpConverter::ConvertBlockToTRTEngine(paddle::framework::BlockDesc, paddle::framework::Scope const&, std::vector<std::string, std::allocator
FatalError: Segmentation fault
is detected by the operating system.
[TimeInfo: Aborted at 1675641400 (unix time) try "date -d @1675641400" if you are using GNU date ]
[SignalInfo: SIGSEGV (@0x8) received by PID 733556 (TID 0x7f8f38bce700) from PID 8 ]
Segmentation fault (core dumped)
The error I asked about previously was resolved by changing the cuDNN version to 8.2.0. Thank you for your assistance.
My original intention was to accelerate the exported model using TensorRT
so I entered the following command in the command line:
CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_fp32 --use_gpu=True
And I got this result
And the second entered command is as follows
CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_fp16 --use_gpu=True
And I got this result
third command is : CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_int8 --use_gpu=True
And I got this result
I'm not sure if this is really due to TensorRT properly working and optimizing the model, as I haven't noticed any significant improvement in the results
The only change I made is the --run_mode, is this the only thing that is being applied? Am I using TensorRT correctly?
And I want to visualize my model as the actual TensorRT application, as shown in the figure below. How can I do that with visualdl?