"Has the performance improved through TensorRT? Is TensorRT currently functioning properly?"

moonnyeon commented 1 year ago

My original intention was to accelerate the exported model using TensorRT

so I entered the following command in the command line:

CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_fp32 --use_gpu=True

And I got this result

And the second entered command is as follows

CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_fp16 --use_gpu=True

And I got this result

third command is : CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_int8 --use_gpu=True

And I got this result

I'm not sure if this is really due to TensorRT properly working and optimizing the model, as I haven't noticed any significant improvement in the results

The only change I made is the --run_mode, is this the only thing that is being applied? Am I using TensorRT correctly?

And I want to visualize my model as the actual TensorRT application, as shown in the figure below. How can I do that with visualdl?

moonnyeon commented 1 year ago

i command this line : CUDA_VISIBLE_DEVICES=0 python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_mode=trt_fp16 --use_gpu=True --run_benchmark=True

ebable_tensorrt:False

-> Does this mean I am not using TensorRT? Or, is it a sudden thought that when implementing a model in PaddlePaddle, I don't need to do anything more because it's already optimized?

lyuwenyu commented 1 year ago

what is your PaddlePaddle version? and make sure setting TensorRT properly

moonnyeon commented 1 year ago

my PaddlePaddle version is

pip list : tensorrt 8.4.0.6

my ~/.bashrc:

lyuwenyu commented 1 year ago

Comment out this line, to check whether app run using trt engine https://github.com/PaddlePaddle/PaddleDetection/blob/6f384cb3043938943aa4ae51931e7d924f380f42/deploy/python/infer.py#L875

moonnyeon commented 1 year ago

Thank you very much for your continuous responses. However, I am now running mot_jde_infer.py instead of infer.py to try FairMOT. Is there a way to check if tensorRT is running in mot_jde_infer.py, similar to what was answered earlier?

moonnyeon commented 1 year ago

oh ! this line comment out : none

moonnyeon commented 1 year ago

I only ran mot_jde_infer.py using paddlepaddle-gpu2.4.1. However, when I tried to run infer.py, I got an error. So I installed paddlepaddle (cpu version 2.4.1) and after that the error disappeared.

So I downloaded and ran PaddlePaddle (CPU version) and the error did not occur. Instead, it seems like I can't use GPU.

moonnyeon commented 1 year ago

Is it necessary to have both PaddlePaddle (CPU) and PaddlePaddle-GPU installed at the same time? Is the order of installation of PaddlePaddle (CPU) first and then PaddlePaddle-GPU later important?

moonnyeon commented 1 year ago

I have tried many things since my last question and I have learned a lot about TensorRT through my efforts. Finally, I ran the following command line.

CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_benchmark=True --run_mode=trt_fp32

I0203 18:37:48.899986 17980 tensorrt_subgraph_pass.cc:244] --- detect a sub-graph with 190 nodes I0203 18:37:48.908310 17980 tensorrt_subgraph_pass.cc:560] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0203 18:37:49.599192 17980 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0203 18:37:49.877519 17980 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0203 18:37:49.877683 17980 helper.h:110] Detected invalid timing cache, setup a local cache instead E0203 18:38:05.247386 17980 helper.h:114] 4: [pluginV2Builder.cpp::makeRunner::680] Error Code 4: Internal Error (Internal error: plugin node deformable_conv (Output: deformable_conv_6.tmp_0804) requires 47628288 bytes of scratch space, but only 33554432 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize(). ) E0203 18:38:05.247411 17980 helper.h:114] 2: [builder.cpp::buildSerializedNetwork::399] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

C++ Traceback (most recent call last):

0 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisPredictor::Init(std::shared_ptr const&, std::shared_ptr const&) 3 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr const&) 4 paddle::AnalysisPredictor::OptimizeInferenceProgram() 5 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument) 6 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument) 7 paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_delete >) 8 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph) const 9 paddle::inference::analysis::TensorRtSubgraphPass::ApplyImpl(paddle::framework::ir::Graph) const 10 paddle::inference::analysis::TensorRtSubgraphPass::CreateTensorRTOp(paddle::framework::ir::Node, paddle::framework::ir::Graph, std::vector<std::string, std::allocator > const&, std::vector<std::string, std::allocator >) const 11 paddle::inference::tensorrt::OpConverter::ConvertBlockToTRTEngine(paddle::framework::BlockDesc, paddle::framework::Scope const&, std::vector<std::string, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, std::vector<std::string, std::allocator > const&, paddle::inference::tensorrt::TensorRTEngine*) 12 paddle::inference::tensorrt::TensorRTEngine::FreezeNetwork()

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1675417085 (unix time) try "date -d @1675417085" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x8) received by PID 17980 (TID 0x7fbb927a46c0) from PID 8 ]

Segmentation fault (core dumped)

now what can i do ,, ?

lyuwenyu commented 1 year ago

The message indicates paddle try to prepare trt engine, but

TensorRT was linked against xxxx, but it maybe not be fatal
[pluginV2Builder.cpp::makeRunner::680] Error Code 4: Internal Error (Internal error: plugin node deformable_conv...., for this internal error, I think u should increase the workspace size by modifing this line

I0203 18:37:48.908310 17980 tensorrt_subgraph_pass.cc:560] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W0203 18:37:49.599192 17980 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1
W0203 18:37:49.877519 17980 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0
W0203 18:37:49.877683 17980 helper.h:110] Detected invalid timing cache, setup a local cache instead
E0203 18:38:05.247386 17980 helper.h:114] 4: [pluginV2Builder.cpp::makeRunner::680] Error Code 4: Internal Error (Internal error: plugin node deformable_conv (Output: deformable_conv_6.tmp_0804) requires 47628288 bytes of scratch space, but only 33554432 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().

moonnyeon commented 1 year ago

"Thank you for your advice, I have modified the line as you suggested, like in the picture below."

"By doing that, the memory shortage error that existed before has been resolved. Thank you so much." "But the error I have never seen before has appeared again."

"I executed the following command line." : CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --image_dir ./test_data/image1 --device=GPU --run_benchmark=True --run_mode=trt_int8 --trt_calib_mode=True

output of the command line :

Found 1 inference images in total. W0206 08:56:38.291393 733556 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.2 W0206 08:56:38.292671 733556 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1. I0206 08:56:38.322510 733556 tensorrt_engine_op.h:421] This process is generating calibration table for Paddle TRT int8... I0206 08:56:38.324936 733598 tensorrt_engine_op.h:301] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0206 08:56:38.998950 733598 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0206 08:56:39.277165 733598 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0206 08:56:40.369802 733598 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0206 08:56:40.370350 733598 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0206 08:56:40.371969 733598 helper.h:110] TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.3.1 W0206 08:56:40.372790 733598 helper.h:110] TensorRT was linked against cuDNN 8.2.0 but loaded cuDNN 8.1.0 W0206 08:56:40.374797 733556 gpu_resources.cc:217] WARNING: device: . The installed Paddle is compiled with CUDNN 8.2, but CUDNN version in your machine is 8.1, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. I0206 08:56:40.411801 733608 tensorrt_engine_op.h:301] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0206 08:56:40.419792 733608 helper.h:110] Unused Input: relu_44.tmp_0_clone_0 E0206 08:56:40.419865 733608 helper.h:114] 4: Output tensor tmp_01159 of type Float produced from output of incompatible type Bool E0206 08:56:40.419961 733608 helper.h:114] 4: [network.cpp::validate::2534] Error Code 4: Internal Error (Could not compute dimensions for tmp_01159, because the network is not valid.) E0206 08:56:40.419971 733608 helper.h:114] 2: [builder.cpp::buildSerializedNetwork::399] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

C++ Traceback (most recent call last):

0 paddle::operators::TensorRTEngineOp::RunCalibration(paddle::framework::Scope const&, phi::Place const&) const::{lambda()#1}::operator()() const 1 paddle::operators::TensorRTEngineOp::PrepareTRTEngine(paddle::framework::Scope const&, paddle::inference::tensorrt::TensorRTEngine) const 2 paddle::inference::tensorrt::OpConverter::ConvertBlockToTRTEngine(paddle::framework::BlockDesc, paddle::framework::Scope const&, std::vector<std::string, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, std::vector<std::string, std::allocator > const&, paddle::inference::tensorrt::TensorRTEngine*) 3 paddle::inference::tensorrt::TensorRTEngine::FreezeNetwork()

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1675641400 (unix time) try "date -d @1675641400" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x8) received by PID 733556 (TID 0x7f8f38bce700) from PID 8 ]

Segmentation fault (core dumped)

moonnyeon commented 1 year ago

The error I asked about previously was resolved by changing the cuDNN version to 8.2.0. Thank you for your assistance.

PaddlePaddle / PaddleDetection