PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.26k stars 5.59k forks source link

Windows 下从源码编译paddle_inference c++库无法启用TRT #50605

Open weida008 opened 1 year ago

weida008 commented 1 year ago

请提出你的问题 Please ask your question

推理库编译完后,按https://www.paddlepaddle.org.cn/inference/v2.4/guides/nv_gpu_infer/gpu_native_infer.html的步骤利用编译好的paddle_inference_install_dir库建立工程,并用Paddle-Inference-Demo\c++\gpu\yolov3的yolov3_test.cc来测试,VC采用2019社区版,运行情况如下: 不打开TRT可正常运行,运行信息如下: inference.exe --model_file ./model/yolov3_r50vd_dcn_270e_coco/model.pdmodel --params_file ./model/yolov3_r50vd_dcn_270e_coco/model.pdiparams e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m WARNING: Logging before InitGoogleLogging() is written to STDERR I0217 15:12:32.518298 6720 executor.cc:186] Old Executor is Running. e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [map_op_to_another_pass]e[0m e[32m--- Running IR pass [identity_scale_op_clean_pass]e[0m I0217 15:12:32.691711 6720 fuse_pass_base.cc:59] --- detected 2 subgraphs e[32m--- Running IR pass [is_test_pass]e[0m e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [delete_weight_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [constant_folding_pass]e[0m e[32m--- Running IR pass [silu_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m I0217 15:12:32.939219 6720 fuse_pass_base.cc:59] --- detected 72 subgraphs e[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]e[0m e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m e[32m--- Running IR pass [vit_attention_fuse_pass]e[0m e[32m--- Running IR pass [fused_multi_transformer_encoder_pass]e[0m e[32m--- Running IR pass [fused_multi_transformer_decoder_pass]e[0m e[32m--- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass]e[0m e[32m--- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass]e[0m e[32m--- Running IR pass [multi_devices_fused_multi_transformer_encoder_pass]e[0m e[32m--- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass]e[0m e[32m--- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass]e[0m e[32m--- Running IR pass [fuse_multi_transformer_layer_pass]e[0m e[32m--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]e[0m e[32m--- Running IR pass [matmul_scale_fuse_pass]e[0m e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]e[0m e[32m--- Running IR pass [fc_fuse_pass]e[0m e[32m--- Running IR pass [fc_elementwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m I0217 15:12:34.549715 6720 fuse_pass_base.cc:59] --- detected 42 subgraphs e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m e[32m--- Running IR pass [auto_mixed_precision_pass]e[0m e[32m--- Running IR pass [inplace_op_var_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m I0217 15:12:34.565307 6720 ir_params_sync_among_devices_pass.cc:94] Sync params from CPU to GPU e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [memory_optimize_pass]e[0m I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : batch_norm_50.tmp_2 size: 739328 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : batch_norm_9.tmp_2 size: 23658496 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : relu_5.tmp_0 size: 23658496 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : elementwise_add_1 size: 23658496 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : yolo_box_1.tmp_0 size: 69312 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : image size: 4435968 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : elementwise_add_11 size: 5914624 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : elementwise_add_15 size: 2957312 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : yolo_box_0.tmp_0 size: 17328 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : scale_factor size: 8 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : cast_0.tmp_0 size: 8 I0217 15:12:34.705691 6720 memory_optimize_pass.cc:220] Cluster name : im_shape size: 8 e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I0217 15:12:34.799851 6720 analysis_predictor.cc:1385] ======= optimize end ======= I0217 15:12:34.799851 6720 naive_executor.cc:151] --- skip [feed], feed -> scale_factor I0217 15:12:34.799851 6720 naive_executor.cc:151] --- skip [feed], feed -> image I0217 15:12:34.799851 6720 naive_executor.cc:151] --- skip [feed], feed -> im_shape I0217 15:12:34.799851 6720 naive_executor.cc:151] --- skip [save_infer_model/scale_0.tmp_1], fetch -> fetch I0217 15:12:34.799851 6720 naive_executor.cc:151] --- skip [save_infer_model/scale_1.tmp_1], fetch -> fetch W0217 15:12:34.799851 6720 gpu_resources.cc:85] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.6 W0217 15:12:34.815106 6720 gpu_resources.cc:115] device: 0, cuDNN Version: 8.4. WARNING: Logging before InitGoogleLogging() is written to STDERR I0217 15:12:37.205616 6720 inference.cc:319] output num is 12

打开TRT后无法正常运行,运行信息如下: inference.exe --model_file ./model/yolov3_r50vd_dcn_270e_coco/model.pdmodel --params_file ./model/yolov3_r50vd_dcn_270e_coco/model.pdiparams --run_mode=trt_fp32 WARNING: Logging before InitGoogleLogging() is written to STDERR I0217 14:43:12.707584 9860 analysis_predictor.cc:1131] TensorRT subgraph engine is enabled e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m I0217 14:43:12.721602 9860 executor.cc:186] Old Executor is Running. e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [trt_support_nhwc_pass]e[0m e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m e[32m--- Running IR pass [delete_fill_constant_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m e[32m--- Running IR pass [trt_delete_weight_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [identity_scale_op_clean_pass]e[0m I0217 14:43:12.940196 9860 fuse_pass_base.cc:59] --- detected 2 subgraphs e[32m--- Running IR pass [add_support_int8_pass]e[0m I0217 14:43:13.096343 9860 fuse_pass_base.cc:59] --- detected 295 subgraphs e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [delete_c_identity_op_pass]e[0m e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]e[0m e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]e[0m e[32m--- Running IR pass [multihead_matmul_roformer_fuse_pass]e[0m e[32m--- Running IR pass [constant_folding_pass]e[0m e[32m--- Running IR pass [trt_flash_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_cross_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [vit_attention_fuse_pass]e[0m e[32m--- Running IR pass [layernorm_shift_partition_fuse_pass]e[0m e[32m--- Running IR pass [merge_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_residual_bias_fuse_pass]e[0m e[32m--- Running IR pass [preln_layernorm_x_fuse_pass]e[0m e[32m--- Running IR pass [reverse_roll_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m I0217 14:43:13.424523 9860 fuse_pass_base.cc:59] --- detected 72 subgraphs e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m e[32m--- Running IR pass [trt_squeeze2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_flatten2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_map_matmul_v2_to_mul_pass]e[0m e[32m--- Running IR pass [trt_map_matmul_v2_to_matmul_pass]e[0m e[32m--- Running IR pass [trt_map_matmul_to_mul_pass]e[0m e[32m--- Running IR pass [fc_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m I0217 14:43:13.486861 9860 fuse_pass_base.cc:59] --- detected 78 subgraphs e[32m--- Running IR pass [remove_padding_recover_padding_pass]e[0m e[32m--- Running IR pass [delete_remove_padding_recover_padding_pass]e[0m e[32m--- Running IR pass [dense_fc_to_sparse_pass]e[0m e[32m--- Running IR pass [dense_multihead_matmul_to_sparse_pass]e[0m e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m I0217 14:43:13.502879 9860 tensorrt_subgraph_pass.cc:232] --- detect a sub-graph with 195 nodes I0217 14:43:13.582355 9860 tensorrt_subgraph_pass.cc:594] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0217 14:43:15.112041 9860 helper.h:123] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible. W0217 14:43:15.112041 9860 place.cc:164] The paddle::PlaceType::kCPU/kGPU is deprecated since version 2.3, and will be removed in version 2.4! Please use Tensor::is_cpu()/is_gpu() method to determine the type of place.

程序如下:

include

include

include

include

include <gflags/gflags.h>

include <glog/logging.h>

include "paddle/include/paddle_inference_api.h"

using paddle_infer::Config; using paddle_infer::Predictor; using paddle_infer::CreatePredictor; using paddle_infer::PrecisionType;

DEFINE_string(model_file, "", "Directory of the inference model."); DEFINE_string(params_file, "", "Directory of the inference model."); DEFINE_string(model_dir, "", "Directory of the inference model."); DEFINE_int32(batch_size, 1, "Directory of the inference model."); DEFINE_int32(warmup, 0, "warmup."); DEFINE_int32(repeats, 1, "repeats."); DEFINE_string(run_mode, "paddle_gpu", "run_mode which can be: trt_fp32, trt_fp16, trt_int8 and paddle_gpu"); DEFINE_bool(use_dynamic_shape, false, "use trt dynaminc shape.");

const int img_height = 512; //608 const int img_width = 512;

using Time = decltype(std::chrono::high_resolution_clock::now()); Time time() { return std::chrono::high_resolution_clock::now(); }; double time_diff(Time t1, Time t2) { typedef std::chrono::microseconds ms; auto diff = t2 - t1; ms counter = std::chrono::duration_cast(diff); return counter.count() / 1000.0; }

std::shared_ptr InitPredictor() { Config config; if (FLAGS_model_dir != "") { config.SetModel(FLAGS_model_dir); } config.SetModel(FLAGS_model_file, FLAGS_params_file); config.EnableUseGpu(500, 0);

if (FLAGS_run_mode == "trt_fp32") { config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, PrecisionType::kFloat32, false, false); } else if (FLAGS_run_mode == "trt_fp16") { config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, PrecisionType::kHalf, false, false); } else if (FLAGS_run_mode == "trt_int8") { config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, PrecisionType::kInt8, false, true); }

if(FLAGS_use_dynamic_shape){ std::map<std::string, std::vector> min_input_shape = { {"im_shape", {FLAGS_batch_size, 2}}, {"image", {FLAGS_batch_size, 3, 112, 112}}, {"scale_factor", {FLAGS_batch_size, 2}} }; std::map<std::string, std::vector> max_input_shape = { {"im_shape", {FLAGS_batch_size, 2}}, {"image", {FLAGS_batch_size, 3, img_height, img_width}}, {"scale_factor", {FLAGS_batch_size, 2}} }; std::map<std::string, std::vector> opt_input_shape = { {"im_shape", {FLAGS_batch_size, 2}}, {"image", {FLAGS_batch_size, 3, img_height, img_width}}, {"scale_factor", {FLAGS_batch_size, 2}} }; config.SetTRTDynamicShapeInfo(min_input_shape, max_input_shape, opt_input_shape); }

// Open the memory optim. config.EnableMemoryOptim(); return CreatePredictor(config); }

void run(Predictor predictor, const std::vector &input, const std::vector &input_shape, const std::vector &input_im, const std::vector &input_im_shape, std::vector out_data) { auto input_names = predictor->GetInputNames(); auto im_shape_handle = predictor->GetInputHandle(input_names[0]); im_shape_handle->Reshape(input_im_shape); im_shape_handle->CopyFromCpu(input_im.data());

auto image_handle = predictor->GetInputHandle(input_names[1]); image_handle->Reshape(input_shape); image_handle->CopyFromCpu(input.data());

auto scale_factor_handle = predictor->GetInputHandle(input_names[2]); scale_factor_handle->Reshape(input_im_shape); scale_factor_handle->CopyFromCpu(input_im.data());

CHECK(predictor->Run());

auto output_names = predictor->GetOutputNames(); auto output_t = predictor->GetOutputHandle(output_names[0]); std::vector output_shape = output_t->shape(); int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1, std::multiplies());

out_data->resize(out_num); output_t->CopyToCpu(out_data->data()); }

int main(int argc, char *argv[]) { google::ParseCommandLineFlags(&argc, &argv, true); auto predictor = InitPredictor();

const int height = img_height; const int width = img_width; const int channels = 3; std::vector input_shape = {FLAGS_batch_size, channels, height, width}; std::vector input_data(FLAGS_batch_size channels height width); for (size_t i = 0; i < input_data.size(); ++i) { input_data[i] = i % 255 0.13f; } std::vector input_im_shape = {FLAGS_batch_size, 2}; std::vector input_im_data(FLAGS_batch_size * 2, img_height);

std::vector out_data; run(predictor.get(), input_data, input_shape, input_im_data, input_im_shape, &out_data); LOG(INFO) << "output num is " << out_data.size(); return 0; }

编译环境信息: GIT COMMIT ID: 16986d6b26bbbe5fbd9ad8244ec88d003db80485 WITH_MKL: ON WITH_MKLDNN: ON WITH_GPU: ON WITH_ROCM: OFF WITH_ASCEND_CL: OFF WITH_ASCEND_CXX11: OFF WITH_IPU: OFF CUDA version: 11.6 CUDNN version: v8.4 CXX compiler version: 19.29.30148.0 WITH_TENSORRT: ON TensorRT version: v8.4.1.5

大家有遇到类似问题吗?如何解决啊?

paddle-bot[bot] commented 1 year ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

weida008 commented 1 year ago

程序头文件显示不全,具体如图 QQ截图20230217155913

ccrrong commented 1 year ago

你好,这个log看不出来是什么问题,可以通过export GLOG_v=4,修改日志等级,打出更多log

weida008 commented 1 year ago

你好,这个log看不出来是什么问题,可以通过export GLOG_v=4,修改日志等级,打出更多log

这个要加在哪里?windows也可以用吗?我是在windowsh编译的推理库。

ccrrong commented 1 year ago

你好,这个log看不出来是什么问题,可以通过export GLOG_v=4,修改日志等级,打出更多log

这个要加在哪里?windows也可以用吗?我是在windowsh编译的推理库。

这个是环境变量,windows应该是用set

weida008 commented 1 year ago

你好,这个log看不出来是什么问题,可以通过export GLOG_v=4,修改日志等级,打出更多log

这个要加在哪里?windows也可以用吗?我是在windowsh编译的推理库。

这个是环境变量,windows应该是用set

我通过set GLOG_v=3,修改日志等级,但还是没有看到错误信息,程序是直接退出了,详细信息如下。 为方便对比我这次用了python程序,即在windows编译完推理库后,安装\Paddle\build\python\dist下的paddlepaddle_gpu-0.0.0-cp39-cp39-win_amd64.whl文件,然后运行Paddle-Inference-Demo\python\gpu\yolov3\infer_yolov3.py程序。 用我自己编译的whl运行信息如下: 执行的命令: D:\ldw\python\yolov3>python infer_yolov3.py --model_file=yolov3_r50vd_dcn_270e_coco/model.pdmodel --params_file=yolov3_r50vd_dcn_270e_coco/model.pdiparams --run_mode=trt_fp32 提示信息过多,截取中间部分相关的信息如下: I0222 09:04:33.543303 4308 tensorrt_subgraph_pass.cc:215] TRT engine key: 8661990638853382273 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::GridAnchor_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::GridAnchorRect_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::NMS_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::Reorg_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::Region_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::Clip_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::LReLU_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::PriorBox_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::Normalize_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::ScatterND version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::RPROI_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::BatchedNMS_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::FlattenConcat_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::CropAndResize version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::DetectionLayer_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::EfficientNMS_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::Proposal version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::ProposalLayer_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::PyramidROIAlign_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::ResizeNearest_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::Split version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::SpecialSlice_TRT version 1 I0222 09:04:33.590130 4308 helper.h:104] Registered plugin creator - ::InstanceNormalization_TRT version 1 I0222 09:04:33.590130 4308 tensorrt_subgraph_pass.cc:560] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0222 09:04:33.950091 4308 helper.h:107] [MemUsageChange] Init CUDA: CPU +360, GPU +0, now: CPU 5693, GPU 1655 (MiB) I0222 09:04:35.122340 4308 helper.h:107] [MemUsageChange] Init builder kernel library: CPU +330, GPU +104, now: CPU 6215, GPU 1759 (MiB) W0222 09:04:35.122340 4308 helper.h:110] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible. I0222 09:04:35.138048 4308 op_converter.h:347] Set trt input [im_shape] type is 5 I0222 09:04:35.138048 4308 op_converter.h:347] Set trt input [image] type is 5 I0222 09:04:35.138048 4308 op_converter.h:347] Set trt input [scale_factor] type is 5 I0222 09:04:35.140046 4308 elementwise_op.cc:28] Convert a fluid elementwise op to TensorRT IElementWiseLayer

D:\ldw\python\yolov3>

用飞桨官方命令python -m pip install paddlepaddle-gpu==2.4.1.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html安装paddlepaddle-gpu后,运行Paddle-Inference-Demo\python\gpu\yolov3\infer_yolov3.py程序,可正常运行,提示信息过多,截取中间部分相关的信息如下: I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::GridAnchor_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::GridAnchorRect_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::NMS_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::Reorg_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::Region_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::Clip_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::LReLU_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::PriorBox_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::Normalize_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::ScatterND version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::RPROI_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::BatchedNMS_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::FlattenConcat_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::CropAndResize version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::DetectionLayer_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::EfficientNMS_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::Proposal version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::ProposalLayer_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::PyramidROIAlign_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::ResizeNearest_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::Split version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::SpecialSlice_TRT version 1 I0222 08:58:49.465214 6756 helper.h:104] Registered plugin creator - ::InstanceNormalization_TRT version 1 I0222 08:58:49.465214 6756 tensorrt_subgraph_pass.cc:560] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0222 08:58:49.824573 6756 helper.h:107] [MemUsageChange] Init CUDA: CPU +360, GPU +0, now: CPU 5321, GPU 1655 (MiB) I0222 08:58:50.976517 6756 helper.h:107] [MemUsageChange] Init builder kernel library: CPU +333, GPU +106, now: CPU 5843, GPU 1761 (MiB) W0222 08:58:50.982681 6756 helper.h:110] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible. I0222 08:58:50.982681 6756 op_converter.h:347] Set trt input [im_shape] type is 5 I0222 08:58:50.982681 6756 op_converter.h:347] Set trt input [image] type is 5 I0222 08:58:50.982681 6756 op_converter.h:347] Set trt input [scale_factor] type is 5 I0222 08:58:50.982681 6756 elementwise_op.cc:28] Convert a fluid elementwise op to TensorRT IElementWiseLayer I0222 08:58:50.982681 6756 conv2d_op.cc:41] convert a fluid conv2d op to tensorrt layer without bias I0222 08:58:50.982681 6756 tensor_util.cc:464] TensorCopySync 32, 3, 3, 3 from Place(cpu) to Place(cpu) I0222 08:58:50.982681 6756 tensor_util.cc:464] TensorCopySync 32 from Place(cpu) to Place(cpu) I0222 08:58:50.982681 6756 activation_op.cc:49] convert a fluid Activation op to tensorrt activation layer whose type is relu I0222 08:58:50.982681 6756 cast_op.cc:36] convert a cast op to tensorrt

ccrrong commented 1 year ago

GLOG_v=0,可以减少log,应该是本地编译的包有问题,https://paddle-inference.readthedocs.io/en/latest/guides/install/compile/source_compile_under_Windows.html,这是infer windows编译文档,可以看一下,编译paddle稳定版本2.4.1

weida008 commented 1 year ago

GLOG_v=0,可以减少log,应该是本地编译的包有问题,https://paddle-inference.readthedocs.io/en/latest/guides/install/compile/source_compile_under_Windows.html,这是infer windows编译文档,可以看一下,编译paddle稳定版本2.4.1

我就是用2.4版编译的,编译完成后也生成了\build\python\dist\paddlepaddle_gpu-0.0.0-cp39-cp39-win_amd64.whl和build\paddle_inference_install_dir等文件,现在不知道如何找问题,能不能再给点建议?谢谢!

编译环境信息如下: GIT COMMIT ID: f0422a28d75f9345fa3b801c01cd0284b3b44be3 WITH_MKL: ON WITH_MKLDNN: ON WITH_GPU: ON WITH_ROCM: OFF WITH_ASCEND_CL: OFF WITH_ASCEND_CXX11: OFF WITH_IPU: OFF CUDA version: 11.6 CUDNN version: v8.4 CXX compiler version: 19.29.30148.0 WITH_TENSORRT: ON TensorRT version: v8.4.1.5