Closed bittergourd1224 closed 6 months ago
感谢关注,我们找相关同事看下。
你好由于这个模型是经过量化后的模型,故采用fp32模式跑是精度不对的,因为已经插入了量化算子。应改成指令 python3 paddle_inference_eval.py --model_path=output/rtdetr_r50vd_6x_coco_quant --reader_config=configs/rtdetr_reader.yml --device=GPU --use_trt=True --precision=int8 --benchmark=True 如果你需要跑浮点的模型,可以从这下载: https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.7/configs/rtdetr 这里有RT-DETR浮点模型下载 和 其导出成静态图模型的教程。浮点模型可设置 --precision=fp16 和 --precision=fp32
另外 RT-DETR模型可能需要更高版本的paddlepaddle-gpu
@xiaoluomi 我在aistudio上重新部署了一套新的环境,paddlepaddle-gpu的版本为2.6.0 用你给的指令,删掉了--use_trt=True,会报错:
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
--- Running analysis [ir_graph_build_pass]
I0403 16:07:40.582815 52182 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [map_op_to_another_pass]
I0403 16:07:40.819634 52182 fuse_pass_base.cc:59] --- detected 47 subgraphs
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
I0403 16:08:01.944480 52182 fuse_pass_base.cc:59] --- detected 1106 subgraphs
--- Running IR pass [delete_weight_dequant_linear_op_pass]
--- Running IR pass [constant_folding_pass]
I0403 16:08:04.266005 52182 fuse_pass_base.cc:59] --- detected 229 subgraphs
--- Running IR pass [silu_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
I0403 16:08:04.597553 52182 fuse_pass_base.cc:59] --- detected 73 subgraphs
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [vit_attention_fuse_pass]
--- Running IR pass [fused_multi_transformer_encoder_pass]
--- Running IR pass [fused_multi_transformer_decoder_pass]
--- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [fuse_multi_transformer_layer_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0403 16:08:08.044272 52182 fuse_pass_base.cc:59] --- detected 92 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0403 16:08:08.068631 52182 fuse_pass_base.cc:59] --- detected 14 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
I0403 16:08:08.107571 52182 fuse_pass_base.cc:59] --- detected 7 subgraphs
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0403 16:08:08.607133 52182 fuse_pass_base.cc:59] --- detected 92 subgraphs
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
I0403 16:08:08.677088 52182 fuse_pass_base.cc:59] --- detected 20 subgraphs
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
I0403 16:08:08.764535 52182 fuse_pass_base.cc:59] --- detected 35 subgraphs
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
I0403 16:08:08.801440 52182 fuse_pass_base.cc:59] --- detected 4 subgraphs
--- Running IR pass [conv_elementwise_add_fuse_pass]
I0403 16:08:08.854338 52182 fuse_pass_base.cc:59] --- detected 46 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [fused_conv2d_add_act_layout_transfer_pass]
--- Running IR pass [transfer_layout_elim_pass]
I0403 16:08:08.963197 52182 transfer_layout_elim_pass.cc:346] move down 0 transfer_layout
I0403 16:08:08.963239 52182 transfer_layout_elim_pass.cc:347] eliminate 0 pair of transfer_layout
--- Running IR pass [auto_mixed_precision_pass]
--- Running IR pass [identity_op_clean_pass]
I0403 16:08:09.042109 52182 fuse_pass_base.cc:59] --- detected 2 subgraphs
--- Running IR pass [inplace_op_var_pass]
I0403 16:08:09.078701 52182 fuse_pass_base.cc:59] --- detected 146 subgraphs
--- Running analysis [save_optimized_model_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0403 16:08:09.089999 52182 ir_params_sync_among_devices_pass.cc:53] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0403 16:08:11.687259 52182 memory_optimize_pass.cc:118] The persistable params in main graph are : 160.793MB
I0403 16:08:11.737028 52182 memory_optimize_pass.cc:246] Cluster name : relu_11.tmp_0 size: 26214400
I0403 16:08:11.737099 52182 memory_optimize_pass.cc:246] Cluster name : tmp_4 size: 409600
I0403 16:08:11.737114 52182 memory_optimize_pass.cc:246] Cluster name : scale_factor size: 8
I0403 16:08:11.737118 52182 memory_optimize_pass.cc:246] Cluster name : relu_5.tmp_0 size: 26214400
I0403 16:08:11.737123 52182 memory_optimize_pass.cc:246] Cluster name : elementwise_add_18 size: 8601600
I0403 16:08:11.737144 52182 memory_optimize_pass.cc:246] Cluster name : elementwise_add_17 size: 8601600
I0403 16:08:11.737152 52182 memory_optimize_pass.cc:246] Cluster name : image size: 4915200
I0403 16:08:11.737159 52182 memory_optimize_pass.cc:246] Cluster name : shape_21.tmp_0_slice_0 size: 4
I0403 16:08:11.737167 52182 memory_optimize_pass.cc:246] Cluster name : tmp_11 size: 1638400
I0403 16:08:11.737174 52182 memory_optimize_pass.cc:246] Cluster name : im_shape size: 8
I0403 16:08:11.737181 52182 memory_optimize_pass.cc:246] Cluster name : layer_norm_23.tmp_2 size: 307200
I0403 16:08:11.737188 52182 memory_optimize_pass.cc:246] Cluster name : transpose_14.tmp_0 size: 76800
I0403 16:08:11.737195 52182 memory_optimize_pass.cc:246] Cluster name : softmax_10.tmp_0 size: 115200
I0403 16:08:11.737202 52182 memory_optimize_pass.cc:246] Cluster name : elementwise_add_2 size: 26214400
I0403 16:08:11.737210 52182 memory_optimize_pass.cc:246] Cluster name : sigmoid_28.tmp_0 size: 4800
--- Running analysis [ir_graph_to_program_pass]
I0403 16:08:12.254211 52182 analysis_predictor.cc:1838] ======= optimize end =======
I0403 16:08:12.310688 52182 naive_executor.cc:200] --- skip [feed], feed -> scale_factor
I0403 16:08:12.310745 52182 naive_executor.cc:200] --- skip [feed], feed -> image
I0403 16:08:12.310756 52182 naive_executor.cc:200] --- skip [feed], feed -> im_shape
I0403 16:08:12.319670 52182 naive_executor.cc:200] --- skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0403 16:08:12.319722 52182 naive_executor.cc:200] --- skip [save_infer_model/scale_1.tmp_0], fetch -> fetch
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[04/03 16:08:12] ppdet.data.source.coco INFO: Load [48 samples valid, 2 samples invalid] in file /home/aistudio/tiny_coco_dataset/tiny_coco/annotations/instances_val2017.json.
W0403 16:08:12.394639 52182 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0403 16:08:12.395746 52182 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
Traceback (most recent call last):
File "/home/aistudio/PaddleSlim/example/auto_compression/detection/paddle_inference_eval.py", line 452, in <module>
main()
File "/home/aistudio/PaddleSlim/example/auto_compression/detection/paddle_inference_eval.py", line 436, in main
eval(predictor, val_loader, metric, rerun_flag=rerun_flag)
File "/home/aistudio/PaddleSlim/example/auto_compression/detection/paddle_inference_eval.py", line 367, in eval
predictor.run()
ValueError: (InvalidArgument) The type of data we are trying to retrieve (float32) does not match the type of data (int8) currently contained in the container.
[Hint: Expected dtype() == phi::CppTypeToDataType<T>::Type(), but received dtype():3 != phi::CppTypeToDataType<T>::Type():10.] (at /paddle/paddle/phi/core/dense_tensor.cc:171)
[operator < fused_fc_elementwise_layernorm > error]
是GPU无法用Int8模式吗? 有能不用TensorRT,使用量化模型成功的方法吗?
不启用Paddle-trt,使用原生GPU来推理量化模型,现在版本的paddle是不支持的,这里报错是算子数据类型不匹配,目前paddle的原生gpu推理也尚未支持量化的模型进行推理,所以需要开启--use_trt=True 来开启Paddle-trt进行量化模型的int8推理。
环境: paddledet 2.6.0 paddlepaddle-gpu 2.4.2 paddleslim 2.6.0
复现步骤: 尝试使用文档中提供的已自动压缩的RT-DETR-R50,下载并解压 https://github.com/PaddlePaddle/PaddleSlim/blob/cbc4d1d8a809f79ae3b6aae776ec4b2cba66ce07/example/auto_compression/detection/README.md?plain=1#L52 使用GPU和fp32模式推理图片,具体指令:
python3 paddle_inference_eval.py --model_path=output/rtdetr_r50vd_6x_coco_quant --reader_config=configs/rtdetr_reader.yml --image_file=000000144941.jpg --device=GPU --precision=fp32
推理结果是:使用图片为 结果与图片明显不符
另外也尝试用coco的一个子数据集进行批量测试,tiny_coco_dataset 指令为:
python3 paddle_inference_eval.py --model_path=output/rtdetr_r50vd_6x_coco_quant --reader_config=configs/rtdetr_reader.yml --device=GPU --precision=fp32 --benchmark=True
结果为:mAP为0
请问是什么原因?谢谢!