PaddlePaddle / PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.
https://paddleslim.readthedocs.io/zh_CN/latest/
Apache License 2.0
1.56k stars 345 forks source link

文档中提供的自动压缩后RT-DETR模型的准确率很低 #1862

Closed bittergourd1224 closed 6 months ago

bittergourd1224 commented 8 months ago

环境: paddledet 2.6.0 paddlepaddle-gpu 2.4.2 paddleslim 2.6.0

复现步骤: 尝试使用文档中提供的已自动压缩的RT-DETR-R50,下载并解压 https://github.com/PaddlePaddle/PaddleSlim/blob/cbc4d1d8a809f79ae3b6aae776ec4b2cba66ce07/example/auto_compression/detection/README.md?plain=1#L52 使用GPU和fp32模式推理图片,具体指令: python3 paddle_inference_eval.py --model_path=output/rtdetr_r50vd_6x_coco_quant --reader_config=configs/rtdetr_reader.yml --image_file=000000144941.jpg --device=GPU --precision=fp32 推理结果是:

W0322 15:17:07.880046 23290 analysis_predictor.cc:1395] The one-time configuration of analysis predictor failed, which may be due to native predictor called first and its configurations taken effect.
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [fused_multi_transformer_encoder_pass]
--- Running IR pass [fused_multi_transformer_decoder_pass]
--- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [fuse_multi_transformer_layer_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
W0322 15:17:12.930387 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.930546 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.930656 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.935962 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936060 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936131 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936201 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936271 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936342 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936410 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936480 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936549 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936619 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936689 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936758 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.936829 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.940769 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.940857 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.940928 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.940999 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941068 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941138 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941207 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941277 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941346 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941416 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941485 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941555 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941625 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941694 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.941763 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945178 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945257 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945327 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945396 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945466 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945534 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945602 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945672 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945741 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945811 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945880 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.945948 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.946018 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.946086 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.946156 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949091 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949169 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949239 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949309 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949379 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949447 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949517 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949585 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949653 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949723 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949790 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949859 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949929 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.949998 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.950066 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.952968 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953047 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953115 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953184 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953253 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953321 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953389 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953459 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953527 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953596 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953665 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953734 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953804 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953872 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.953941 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.956862 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.956943 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957013 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957082 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957149 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957218 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957286 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957355 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957423 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957492 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957561 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957630 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957700 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957767 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.957836 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.960781 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.960860 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.960929 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.960999 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.961066 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.961134 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.961202 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.961270 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.961339 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.961408 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:17:12.961477 23290 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
I0322 15:17:12.961534 23290 fuse_pass_base.cc:59] ---  detected 14 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [constant_folding_pass]
--- Running IR pass [auto_mixed_precision_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0322 15:17:36.090184 23290 ir_params_sync_among_devices_pass.cc:89] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0322 15:17:36.764639 23290 memory_optimize_pass.cc:219] Cluster name : fill_constant_213.tmp_0  size: 4
I0322 15:17:36.764690 23290 memory_optimize_pass.cc:219] Cluster name : tmp_143  size: 4
I0322 15:17:36.764705 23290 memory_optimize_pass.cc:219] Cluster name : conv2d_118.tmp_0.quantized  size: 26214400
I0322 15:17:36.764717 23290 memory_optimize_pass.cc:219] Cluster name : batch_norm_6.tmp_2  size: 26214400
I0322 15:17:36.764729 23290 memory_optimize_pass.cc:219] Cluster name : fill_constant_325.tmp_0  size: 4
I0322 15:17:36.764741 23290 memory_optimize_pass.cc:219] Cluster name : fill_constant_49.tmp_0  size: 4
I0322 15:17:36.764753 23290 memory_optimize_pass.cc:219] Cluster name : elementwise_add_0  size: 26214400
I0322 15:17:36.764765 23290 memory_optimize_pass.cc:219] Cluster name : flatten_33.tmp_0.quantized.dequantized  size: 9600
I0322 15:17:36.764777 23290 memory_optimize_pass.cc:219] Cluster name : conv2d_115.tmp_0.quantized.dequantized  size: 26214400
I0322 15:17:36.764789 23290 memory_optimize_pass.cc:219] Cluster name : relu_5.tmp_0.quantized.dequantized  size: 26214400
I0322 15:17:36.764801 23290 memory_optimize_pass.cc:219] Cluster name : concat_4.tmp_0  size: 8601600
I0322 15:17:36.764812 23290 memory_optimize_pass.cc:219] Cluster name : image  size: 4915200
I0322 15:17:36.764824 23290 memory_optimize_pass.cc:219] Cluster name : transpose_10.tmp_0  size: 307200
I0322 15:17:36.764837 23290 memory_optimize_pass.cc:219] Cluster name : scale_factor  size: 8
I0322 15:17:36.764847 23290 memory_optimize_pass.cc:219] Cluster name : elementwise_add_18  size: 8601600
I0322 15:17:36.764858 23290 memory_optimize_pass.cc:219] Cluster name : cast_2.tmp_0  size: 2150400
I0322 15:17:36.764876 23290 memory_optimize_pass.cc:219] Cluster name : sigmoid_28.tmp_0.quantized.dequantized  size: 4800
I0322 15:17:36.764886 23290 memory_optimize_pass.cc:219] Cluster name : conv2d_118.tmp_0.quantized.dequantized  size: 26214400
I0322 15:17:36.764902 23290 memory_optimize_pass.cc:219] Cluster name : im_shape  size: 8
--- Running analysis [ir_graph_to_program_pass]
I0322 15:17:37.901614 23290 analysis_predictor.cc:1318] ======= optimize end =======
I0322 15:17:37.951170 23290 naive_executor.cc:110] ---  skip [feed], feed -> scale_factor
I0322 15:17:37.951238 23290 naive_executor.cc:110] ---  skip [feed], feed -> image
I0322 15:17:37.951278 23290 naive_executor.cc:110] ---  skip [feed], feed -> im_shape
I0322 15:17:37.998129 23290 naive_executor.cc:110] ---  skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0322 15:17:37.998183 23290 naive_executor.cc:110] ---  skip [save_infer_model/scale_1.tmp_0], fetch -> fetch
W0322 15:17:38.045212 23290 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 10.2
W0322 15:17:38.052327 23290 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
[Benchmark]Inference time(ms): min=90.2, max=90.2, avg=90.2
bicycle: 0.732
bicycle: 0.709

使用图片为 000000144941 结果与图片明显不符

另外也尝试用coco的一个子数据集进行批量测试,tiny_coco_dataset 指令为: python3 paddle_inference_eval.py --model_path=output/rtdetr_r50vd_6x_coco_quant --reader_config=configs/rtdetr_reader.yml --device=GPU --precision=fp32 --benchmark=True 结果为:

W0322 15:43:39.533730 27379 analysis_predictor.cc:1395] The one-time configuration of analysis predictor failed, which may be due to native predictor called first and its configurations taken effect.
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [fused_multi_transformer_encoder_pass]
--- Running IR pass [fused_multi_transformer_decoder_pass]
--- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [fuse_multi_transformer_layer_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
W0322 15:43:48.186563 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.186674 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.186748 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.190639 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.190724 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.190795 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.190865 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.190945 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191016 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191084 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191152 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191220 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191287 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191356 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191424 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.191493 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194361 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194439 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194509 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194578 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194646 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194715 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194783 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194851 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194929 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.194999 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.195067 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.195135 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.195204 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.195272 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.195340 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198206 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198283 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198352 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198421 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198489 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198558 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198626 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198694 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198762 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198832 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198909 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.198979 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.199047 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.199115 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.199183 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202046 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202122 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202191 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202260 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202329 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202397 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202466 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202534 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202602 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202670 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202740 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202808 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202876 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.202955 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.203024 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.205875 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.205950 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206020 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206089 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206156 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206224 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206292 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206362 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206429 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206497 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206565 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206633 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206702 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206769 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.206838 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.209704 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.209784 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.209851 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.209920 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.209988 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210057 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210124 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210193 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210261 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210330 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210397 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210465 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210534 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210602 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.210670 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.213546 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.213624 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.213691 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.213759 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.213827 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.213894 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.213961 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.214030 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.214097 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.214165 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0322 15:43:48.214233 27379 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
I0322 15:43:48.214284 27379 fuse_pass_base.cc:59] ---  detected 14 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [constant_folding_pass]
--- Running IR pass [auto_mixed_precision_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0322 15:44:08.669596 27379 ir_params_sync_among_devices_pass.cc:89] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0322 15:44:09.206269 27379 memory_optimize_pass.cc:219] Cluster name : fill_constant_213.tmp_0  size: 4
I0322 15:44:09.206315 27379 memory_optimize_pass.cc:219] Cluster name : tmp_143  size: 4
I0322 15:44:09.206326 27379 memory_optimize_pass.cc:219] Cluster name : conv2d_118.tmp_0.quantized  size: 26214400
I0322 15:44:09.206336 27379 memory_optimize_pass.cc:219] Cluster name : batch_norm_6.tmp_2  size: 26214400
I0322 15:44:09.206346 27379 memory_optimize_pass.cc:219] Cluster name : fill_constant_325.tmp_0  size: 4
I0322 15:44:09.206353 27379 memory_optimize_pass.cc:219] Cluster name : fill_constant_49.tmp_0  size: 4
I0322 15:44:09.206362 27379 memory_optimize_pass.cc:219] Cluster name : elementwise_add_0  size: 26214400
I0322 15:44:09.206372 27379 memory_optimize_pass.cc:219] Cluster name : flatten_33.tmp_0.quantized.dequantized  size: 9600
I0322 15:44:09.206380 27379 memory_optimize_pass.cc:219] Cluster name : conv2d_115.tmp_0.quantized.dequantized  size: 26214400
I0322 15:44:09.206389 27379 memory_optimize_pass.cc:219] Cluster name : relu_5.tmp_0.quantized.dequantized  size: 26214400
I0322 15:44:09.206398 27379 memory_optimize_pass.cc:219] Cluster name : concat_4.tmp_0  size: 8601600
I0322 15:44:09.206406 27379 memory_optimize_pass.cc:219] Cluster name : image  size: 4915200
I0322 15:44:09.206414 27379 memory_optimize_pass.cc:219] Cluster name : transpose_10.tmp_0  size: 307200
I0322 15:44:09.206423 27379 memory_optimize_pass.cc:219] Cluster name : scale_factor  size: 8
I0322 15:44:09.206431 27379 memory_optimize_pass.cc:219] Cluster name : elementwise_add_18  size: 8601600
I0322 15:44:09.206440 27379 memory_optimize_pass.cc:219] Cluster name : cast_2.tmp_0  size: 2150400
I0322 15:44:09.206449 27379 memory_optimize_pass.cc:219] Cluster name : sigmoid_28.tmp_0.quantized.dequantized  size: 4800
I0322 15:44:09.206457 27379 memory_optimize_pass.cc:219] Cluster name : conv2d_118.tmp_0.quantized.dequantized  size: 26214400
I0322 15:44:09.206466 27379 memory_optimize_pass.cc:219] Cluster name : im_shape  size: 8
--- Running analysis [ir_graph_to_program_pass]
I0322 15:44:09.936969 27379 analysis_predictor.cc:1318] ======= optimize end =======
I0322 15:44:09.971310 27379 naive_executor.cc:110] ---  skip [feed], feed -> scale_factor
I0322 15:44:09.971345 27379 naive_executor.cc:110] ---  skip [feed], feed -> image
I0322 15:44:09.971355 27379 naive_executor.cc:110] ---  skip [feed], feed -> im_shape
I0322 15:44:10.002552 27379 naive_executor.cc:110] ---  skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0322 15:44:10.002584 27379 naive_executor.cc:110] ---  skip [save_infer_model/scale_1.tmp_0], fetch -> fetch
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[03/22 15:44:10] ppdet.data.source.coco INFO: Load [48 samples valid, 2 samples invalid] in file /home/foia_xlc/dataset/tiny_coco_dataset/tiny_coco/annotations/instances_val2017.json.
Evaluating:   0%|                                                                                                                                                                    | 0/48 [00:00<?, ?it/s]W0322 15:44:10.055075 27379 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 10.2
W0322 15:44:10.060184 27379 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
Evaluating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [00:06<00:00,  7.55it/s]
[03/22 15:44:16] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[03/22 15:44:16] ppdet.metrics.coco_utils INFO: Start evaluate...
Loading and preparing results...
DONE (t=0.21s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.27s).
Accumulating evaluation results...
DONE (t=0.37s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
[Benchmark]Inference time(ms): min=71.54, max=2036.7, avg=121.4
[Benchmark] COCO mAP: 0.0

mAP为0

请问是什么原因?谢谢!

wanghaoshuang commented 7 months ago

感谢关注,我们找相关同事看下。

xiaoluomi commented 7 months ago

你好由于这个模型是经过量化后的模型,故采用fp32模式跑是精度不对的,因为已经插入了量化算子。应改成指令 python3 paddle_inference_eval.py --model_path=output/rtdetr_r50vd_6x_coco_quant --reader_config=configs/rtdetr_reader.yml --device=GPU --use_trt=True --precision=int8 --benchmark=True 如果你需要跑浮点的模型,可以从这下载: https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.7/configs/rtdetr 这里有RT-DETR浮点模型下载 和 其导出成静态图模型的教程。浮点模型可设置 --precision=fp16 和 --precision=fp32

xiaoluomi commented 7 months ago

另外 RT-DETR模型可能需要更高版本的paddlepaddle-gpu

bittergourd1224 commented 7 months ago

@xiaoluomi 我在aistudio上重新部署了一套新的环境,paddlepaddle-gpu的版本为2.6.0 用你给的指令,删掉了--use_trt=True,会报错:

Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
--- Running analysis [ir_graph_build_pass]
I0403 16:07:40.582815 52182 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [map_op_to_another_pass]
I0403 16:07:40.819634 52182 fuse_pass_base.cc:59] ---  detected 47 subgraphs
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
I0403 16:08:01.944480 52182 fuse_pass_base.cc:59] ---  detected 1106 subgraphs
--- Running IR pass [delete_weight_dequant_linear_op_pass]
--- Running IR pass [constant_folding_pass]
I0403 16:08:04.266005 52182 fuse_pass_base.cc:59] ---  detected 229 subgraphs
--- Running IR pass [silu_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
I0403 16:08:04.597553 52182 fuse_pass_base.cc:59] ---  detected 73 subgraphs
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [vit_attention_fuse_pass]
--- Running IR pass [fused_multi_transformer_encoder_pass]
--- Running IR pass [fused_multi_transformer_decoder_pass]
--- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [fuse_multi_transformer_layer_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0403 16:08:08.044272 52182 fuse_pass_base.cc:59] ---  detected 92 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0403 16:08:08.068631 52182 fuse_pass_base.cc:59] ---  detected 14 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
I0403 16:08:08.107571 52182 fuse_pass_base.cc:59] ---  detected 7 subgraphs
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0403 16:08:08.607133 52182 fuse_pass_base.cc:59] ---  detected 92 subgraphs
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
I0403 16:08:08.677088 52182 fuse_pass_base.cc:59] ---  detected 20 subgraphs
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
I0403 16:08:08.764535 52182 fuse_pass_base.cc:59] ---  detected 35 subgraphs
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
I0403 16:08:08.801440 52182 fuse_pass_base.cc:59] ---  detected 4 subgraphs
--- Running IR pass [conv_elementwise_add_fuse_pass]
I0403 16:08:08.854338 52182 fuse_pass_base.cc:59] ---  detected 46 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [fused_conv2d_add_act_layout_transfer_pass]
--- Running IR pass [transfer_layout_elim_pass]
I0403 16:08:08.963197 52182 transfer_layout_elim_pass.cc:346] move down 0 transfer_layout
I0403 16:08:08.963239 52182 transfer_layout_elim_pass.cc:347] eliminate 0 pair of transfer_layout
--- Running IR pass [auto_mixed_precision_pass]
--- Running IR pass [identity_op_clean_pass]
I0403 16:08:09.042109 52182 fuse_pass_base.cc:59] ---  detected 2 subgraphs
--- Running IR pass [inplace_op_var_pass]
I0403 16:08:09.078701 52182 fuse_pass_base.cc:59] ---  detected 146 subgraphs
--- Running analysis [save_optimized_model_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0403 16:08:09.089999 52182 ir_params_sync_among_devices_pass.cc:53] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0403 16:08:11.687259 52182 memory_optimize_pass.cc:118] The persistable params in main graph are : 160.793MB
I0403 16:08:11.737028 52182 memory_optimize_pass.cc:246] Cluster name : relu_11.tmp_0  size: 26214400
I0403 16:08:11.737099 52182 memory_optimize_pass.cc:246] Cluster name : tmp_4  size: 409600
I0403 16:08:11.737114 52182 memory_optimize_pass.cc:246] Cluster name : scale_factor  size: 8
I0403 16:08:11.737118 52182 memory_optimize_pass.cc:246] Cluster name : relu_5.tmp_0  size: 26214400
I0403 16:08:11.737123 52182 memory_optimize_pass.cc:246] Cluster name : elementwise_add_18  size: 8601600
I0403 16:08:11.737144 52182 memory_optimize_pass.cc:246] Cluster name : elementwise_add_17  size: 8601600
I0403 16:08:11.737152 52182 memory_optimize_pass.cc:246] Cluster name : image  size: 4915200
I0403 16:08:11.737159 52182 memory_optimize_pass.cc:246] Cluster name : shape_21.tmp_0_slice_0  size: 4
I0403 16:08:11.737167 52182 memory_optimize_pass.cc:246] Cluster name : tmp_11  size: 1638400
I0403 16:08:11.737174 52182 memory_optimize_pass.cc:246] Cluster name : im_shape  size: 8
I0403 16:08:11.737181 52182 memory_optimize_pass.cc:246] Cluster name : layer_norm_23.tmp_2  size: 307200
I0403 16:08:11.737188 52182 memory_optimize_pass.cc:246] Cluster name : transpose_14.tmp_0  size: 76800
I0403 16:08:11.737195 52182 memory_optimize_pass.cc:246] Cluster name : softmax_10.tmp_0  size: 115200
I0403 16:08:11.737202 52182 memory_optimize_pass.cc:246] Cluster name : elementwise_add_2  size: 26214400
I0403 16:08:11.737210 52182 memory_optimize_pass.cc:246] Cluster name : sigmoid_28.tmp_0  size: 4800
--- Running analysis [ir_graph_to_program_pass]
I0403 16:08:12.254211 52182 analysis_predictor.cc:1838] ======= optimize end =======
I0403 16:08:12.310688 52182 naive_executor.cc:200] ---  skip [feed], feed -> scale_factor
I0403 16:08:12.310745 52182 naive_executor.cc:200] ---  skip [feed], feed -> image
I0403 16:08:12.310756 52182 naive_executor.cc:200] ---  skip [feed], feed -> im_shape
I0403 16:08:12.319670 52182 naive_executor.cc:200] ---  skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0403 16:08:12.319722 52182 naive_executor.cc:200] ---  skip [save_infer_model/scale_1.tmp_0], fetch -> fetch
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[04/03 16:08:12] ppdet.data.source.coco INFO: Load [48 samples valid, 2 samples invalid] in file /home/aistudio/tiny_coco_dataset/tiny_coco/annotations/instances_val2017.json.
W0403 16:08:12.394639 52182 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0403 16:08:12.395746 52182 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
Traceback (most recent call last):
  File "/home/aistudio/PaddleSlim/example/auto_compression/detection/paddle_inference_eval.py", line 452, in <module>
    main()
  File "/home/aistudio/PaddleSlim/example/auto_compression/detection/paddle_inference_eval.py", line 436, in main
    eval(predictor, val_loader, metric, rerun_flag=rerun_flag)
  File "/home/aistudio/PaddleSlim/example/auto_compression/detection/paddle_inference_eval.py", line 367, in eval
    predictor.run()
ValueError: (InvalidArgument) The type of data we are trying to retrieve (float32) does not match the type of data (int8) currently contained in the container.
  [Hint: Expected dtype() == phi::CppTypeToDataType<T>::Type(), but received dtype():3 != phi::CppTypeToDataType<T>::Type():10.] (at /paddle/paddle/phi/core/dense_tensor.cc:171)
  [operator < fused_fc_elementwise_layernorm > error]

是GPU无法用Int8模式吗? 有能不用TensorRT,使用量化模型成功的方法吗?

xiaoluomi commented 7 months ago

不启用Paddle-trt,使用原生GPU来推理量化模型,现在版本的paddle是不支持的,这里报错是算子数据类型不匹配,目前paddle的原生gpu推理也尚未支持量化的模型进行推理,所以需要开启--use_trt=True 来开启Paddle-trt进行量化模型的int8推理。