Mandylove1993 / CUDA-FastBEV

TensorRT deploy and PTQ/QAT tools development for FastBEV, total time only need 6.9ms!!!
MIT License
233 stars 37 forks source link

fast-bev的pytorch模型转tensorrt报错 Assertion failed: scaleAllPositive && "Scale coefficients must all be positive" #35

Open Anmidy opened 2 weeks ago

Anmidy commented 2 weeks ago

你好,我重训练得到模型,想转tensorrt模型,走以下流程: 1、用fast-bev工程,本工程的config文件:fastbev_m0_r18_s256x704_v200x200x4_c192_d2_f1.py进行训练,得到了epoch_20.pth, 2、再经过 python ptq_bev.py和 python export_onnx.py流程,得到了两个onnx文件,fastbev_pre_trt_ptq.onnx 和fastbev_post_trt_ptq.onnx 3、最后,执行 bash tool/build_trt_engine.sh生成tensorrt文件时报错,请问是哪个步骤出错了吗?是步骤2中未执行 python qat_bev.py?但是 qat_bev.py 文件找不到。tensorrt转换日志内容如下所示(长度受限制,中间省略了一部分,出错内容在文末):

&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=model/resnet18int8head_1f/fastbev_pre_trt_ptq.onnx --fp16 --int8 --inputIOFormats=fp16:chw, --outputIOFormats=fp16:chw, --saveEngine=model/resnet18int8head_1f/build/fastbev_pre_trt_ptq.plan --memPoolSize=workspace:2048 --verbose --dumpLayerInfo --dumpProfile --separateProfileRun --profilingVerbosity=detailed --exportLayerInfo=model/resnet18int8head_1f/build/fastbev_pre_trt_ptq.json
[11/06/2024-14:33:52] [I] === Model Options ===
[11/06/2024-14:33:52] [I] Format: ONNX
[11/06/2024-14:33:52] [I] Model: model/resnet18int8head_1f/fastbev_pre_trt_ptq.onnx
[11/06/2024-14:33:52] [I] Output:
[11/06/2024-14:33:52] [I] === Build Options ===
[11/06/2024-14:33:52] [I] Max batch: explicit batch
[11/06/2024-14:33:52] [I] Memory Pools: workspace: 2048 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/06/2024-14:33:52] [I] minTiming: 1
[11/06/2024-14:33:52] [I] avgTiming: 8
[11/06/2024-14:33:52] [I] Precision: FP32+FP16+INT8
[11/06/2024-14:33:52] [I] LayerPrecisions: 
[11/06/2024-14:33:52] [I] Layer Device Types: 
[11/06/2024-14:33:52] [I] Calibration: Dynamic
[11/06/2024-14:33:52] [I] Refit: Disabled
[11/06/2024-14:33:52] [I] Version Compatible: Disabled
[11/06/2024-14:33:52] [I] TensorRT runtime: full
[11/06/2024-14:33:52] [I] Lean DLL Path: 
[11/06/2024-14:33:52] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[11/06/2024-14:33:52] [I] Exclude Lean Runtime: Disabled
[11/06/2024-14:33:52] [I] Sparsity: Disabled
[11/06/2024-14:33:52] [I] Safe mode: Disabled
[11/06/2024-14:33:52] [I] Build DLA standalone loadable: Disabled
[11/06/2024-14:33:52] [I] Allow GPU fallback for DLA: Disabled
[11/06/2024-14:33:52] [I] DirectIO mode: Disabled
[11/06/2024-14:33:52] [I] Restricted mode: Disabled
[11/06/2024-14:33:52] [I] Skip inference: Disabled
[11/06/2024-14:33:52] [I] Save engine: model/resnet18int8head_1f/build/fastbev_pre_trt_ptq.plan
[11/06/2024-14:33:52] [I] Load engine: 
[11/06/2024-14:33:52] [I] Profiling verbosity: 2
[11/06/2024-14:33:52] [I] Tactic sources: Using default tactic sources
[11/06/2024-14:33:52] [I] timingCacheMode: local
[11/06/2024-14:33:52] [I] timingCacheFile: 
[11/06/2024-14:33:52] [I] Heuristic: Disabled
[11/06/2024-14:33:52] [I] Preview Features: Use default preview flags.
[11/06/2024-14:33:52] [I] MaxAuxStreams: -1
[11/06/2024-14:33:52] [I] BuilderOptimizationLevel: -1
[11/06/2024-14:33:52] [I] Input(s): fp16:chw
[11/06/2024-14:33:52] [I] Output(s): fp16:chw
[11/06/2024-14:33:52] [I] Input build shapes: model
[11/06/2024-14:33:52] [I] Input calibration shapes: model
[11/06/2024-14:33:52] [I] === System Options ===
[11/06/2024-14:33:52] [I] Device: 0
[11/06/2024-14:33:52] [I] DLACore: 
[11/06/2024-14:33:52] [I] Plugins:
[11/06/2024-14:33:52] [I] setPluginsToSerialize:
[11/06/2024-14:33:52] [I] dynamicPlugins:
[11/06/2024-14:33:52] [I] ignoreParsedPluginLibs: 0
[11/06/2024-14:33:52] [I] 
[11/06/2024-14:33:52] [I] === Inference Options ===
[11/06/2024-14:33:52] [I] Batch: Explicit
[11/06/2024-14:33:52] [I] Input inference shapes: model
[11/06/2024-14:33:52] [I] Iterations: 10
[11/06/2024-14:33:52] [I] Duration: 3s (+ 200ms warm up)
[11/06/2024-14:33:52] [I] Sleep time: 0ms
[11/06/2024-14:33:52] [I] Idle time: 0ms
[11/06/2024-14:33:52] [I] Inference Streams: 1
[11/06/2024-14:33:52] [I] ExposeDMA: Disabled
[11/06/2024-14:33:52] [I] Data transfers: Enabled
[11/06/2024-14:33:52] [I] Spin-wait: Disabled
[11/06/2024-14:33:52] [I] Multithreading: Disabled
[11/06/2024-14:33:52] [I] CUDA Graph: Disabled
[11/06/2024-14:33:52] [I] Separate profiling: Enabled
[11/06/2024-14:33:52] [I] Time Deserialize: Disabled
[11/06/2024-14:33:52] [I] Time Refit: Disabled
[11/06/2024-14:33:52] [I] NVTX verbosity: 2
[11/06/2024-14:33:52] [I] Persistent Cache Ratio: 0
[11/06/2024-14:33:52] [I] Inputs:
[11/06/2024-14:33:52] [I] === Reporting Options ===
[11/06/2024-14:33:52] [I] Verbose: Enabled
[11/06/2024-14:33:52] [I] Averages: 10 inferences
[11/06/2024-14:33:52] [I] Percentiles: 90,95,99
[11/06/2024-14:33:52] [I] Dump refittable layers:Disabled
[11/06/2024-14:33:52] [I] Dump output: Disabled
[11/06/2024-14:33:52] [I] Profile: Enabled
[11/06/2024-14:33:52] [I] Export timing to JSON file: 
[11/06/2024-14:33:52] [I] Export output to JSON file: 
[11/06/2024-14:33:52] [I] Export profile to JSON file: 
[11/06/2024-14:33:52] [I] 
[11/06/2024-14:33:52] [I] === Device Information ===
[11/06/2024-14:33:52] [I] Selected Device: Quadro RTX 4000
[11/06/2024-14:33:52] [I] Compute Capability: 7.5
[11/06/2024-14:33:52] [I] SMs: 36
[11/06/2024-14:33:52] [I] Device Global Memory: 7966 MiB
[11/06/2024-14:33:52] [I] Shared Memory per SM: 64 KiB
[11/06/2024-14:33:52] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/06/2024-14:33:52] [I] Application Compute Clock Rate: 1.545 GHz
[11/06/2024-14:33:52] [I] Application Memory Clock Rate: 6.501 GHz
[11/06/2024-14:33:52] [I] 
[11/06/2024-14:33:52] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[11/06/2024-14:33:52] [I] 
[11/06/2024-14:33:52] [I] TensorRT version: 8.6.1
[11/06/2024-14:33:52] [I] Loading standard plugins
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::CoordConvAC version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::CropAndResizeDynamic version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::DecodeBbox3DPlugin version 1
[11/06/2024-14:33:52] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
......
[11/06/2024-14:33:57] [V] [TRT] QuantizeLinear_161 [QuantizeLinear] inputs: [368 -> (6, 256, 16, 44)[FLOAT]], [369 -> ()[FLOAT]], [370 -> ()[INT8]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 369 for ONNX node: 369
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 370 for ONNX node: 370
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 371 for ONNX tensor: 371
[11/06/2024-14:33:57] [V] [TRT] QuantizeLinear_161 [QuantizeLinear] outputs: [371 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: DequantizeLinear_164 [DequantizeLinear]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 371
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 369
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 373
[11/06/2024-14:33:57] [V] [TRT] DequantizeLinear_164 [DequantizeLinear] inputs: [371 -> (6, 256, 16, 44)[FLOAT]], [369 -> ()[FLOAT]], [373 -> ()[INT8]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 373 for ONNX node: 373
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 374 for ONNX tensor: 374
[11/06/2024-14:33:57] [V] [TRT] DequantizeLinear_164 [DequantizeLinear] outputs: [374 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: QuantizeLinear_166 [QuantizeLinear]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: model.backbone.layer3.1.conv2.weight
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 375
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 719
[11/06/2024-14:33:57] [V] [TRT] QuantizeLinear_166 [QuantizeLinear] inputs: [model.backbone.layer3.1.conv2.weight -> (256, 256, 3, 3)[FLOAT]], [375 -> (256)[FLOAT]], [719 -> (256)[INT8]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: model.backbone.layer3.1.conv2.weight for ONNX node: model.backbone.layer3.1.conv2.weight
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 375 for ONNX node: 375
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 719 for ONNX node: 719
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 378 for ONNX tensor: 378
[11/06/2024-14:33:57] [V] [TRT] QuantizeLinear_166 [QuantizeLinear] outputs: [378 -> (256, 256, 3, 3)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: DequantizeLinear_167 [DequantizeLinear]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 378
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 375
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 719
[11/06/2024-14:33:57] [V] [TRT] DequantizeLinear_167 [DequantizeLinear] inputs: [378 -> (256, 256, 3, 3)[FLOAT]], [375 -> (256)[FLOAT]], [719 -> (256)[INT8]], 
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 379 for ONNX tensor: 379
[11/06/2024-14:33:57] [V] [TRT] DequantizeLinear_167 [DequantizeLinear] outputs: [379 -> (256, 256, 3, 3)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: Conv_168 [Conv]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 374
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 379
[11/06/2024-14:33:57] [V] [TRT] Searching for input: model.backbone.layer3.1.conv2.bias
[11/06/2024-14:33:57] [V] [TRT] Conv_168 [Conv] inputs: [374 -> (6, 256, 16, 44)[FLOAT]], [379 -> (256, 256, 3, 3)[FLOAT]], [model.backbone.layer3.1.conv2.bias -> (256)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Kernel weights are not set yet. Kernel weights must be set using setInput(1, kernel_tensor) API call.
[11/06/2024-14:33:57] [V] [TRT] Registering layer: Conv_168 for ONNX node: Conv_168
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 380 for ONNX tensor: 380
[11/06/2024-14:33:57] [V] [TRT] Conv_168 [Conv] outputs: [380 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: Add_169 [Add]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 380
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 355
[11/06/2024-14:33:57] [V] [TRT] Add_169 [Add] inputs: [380 -> (6, 256, 16, 44)[FLOAT]], [355 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: Add_169 for ONNX node: Add_169
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 381 for ONNX tensor: 381
[11/06/2024-14:33:57] [V] [TRT] Add_169 [Add] outputs: [381 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: Relu_170 [Relu]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 381
[11/06/2024-14:33:57] [V] [TRT] Relu_170 [Relu] inputs: [381 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: Relu_170 for ONNX node: Relu_170
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 382 for ONNX tensor: 382
[11/06/2024-14:33:57] [V] [TRT] Relu_170 [Relu] outputs: [382 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: QuantizeLinear_173 [QuantizeLinear]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 382
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 383
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 384
[11/06/2024-14:33:57] [V] [TRT] QuantizeLinear_173 [QuantizeLinear] inputs: [382 -> (6, 256, 16, 44)[FLOAT]], [383 -> ()[FLOAT]], [384 -> ()[INT8]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 383 for ONNX node: 383
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 384 for ONNX node: 384
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 385 for ONNX tensor: 385
[11/06/2024-14:33:57] [V] [TRT] QuantizeLinear_173 [QuantizeLinear] outputs: [385 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: DequantizeLinear_176 [DequantizeLinear]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 385
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 383
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 387
[11/06/2024-14:33:57] [V] [TRT] DequantizeLinear_176 [DequantizeLinear] inputs: [385 -> (6, 256, 16, 44)[FLOAT]], [383 -> ()[FLOAT]], [387 -> ()[INT8]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: 387 for ONNX node: 387
[11/06/2024-14:33:57] [V] [TRT] Registering tensor: 388 for ONNX tensor: 388
[11/06/2024-14:33:57] [V] [TRT] DequantizeLinear_176 [DequantizeLinear] outputs: [388 -> (6, 256, 16, 44)[FLOAT]], 
[11/06/2024-14:33:57] [V] [TRT] Parsing node: QuantizeLinear_178 [QuantizeLinear]
[11/06/2024-14:33:57] [V] [TRT] Searching for input: model.backbone.layer4.0.conv1.weight
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 389
[11/06/2024-14:33:57] [V] [TRT] Searching for input: 720
[11/06/2024-14:33:57] [V] [TRT] QuantizeLinear_178 [QuantizeLinear] inputs: [model.backbone.layer4.0.conv1.weight -> (512, 256, 3, 3)[FLOAT]], [389 -> (512)[FLOAT]], [720 -> (512)[INT8]], 
[11/06/2024-14:33:57] [V] [TRT] Registering layer: model.backbone.layer4.0.conv1.weight for ONNX node: model.backbone.layer4.0.conv1.weight
[11/06/2024-14:33:57] [E] [TRT] ModelImporter.cpp:771: While parsing node number 98 [QuantizeLinear -> "392"]:
[11/06/2024-14:33:57] [E] [TRT] ModelImporter.cpp:772: --- Begin node ---
[11/06/2024-14:33:57] [E] [TRT] ModelImporter.cpp:773: input: "model.backbone.layer4.0.conv1.weight"
input: "389"
input: "720"
output: "392"
name: "QuantizeLinear_178"
op_type: "QuantizeLinear"
attribute {
  name: "axis"
  i: 0
  type: INT
}

[11/06/2024-14:33:57] [E] [TRT] ModelImporter.cpp:774: --- End node ---
[11/06/2024-14:33:57] [E] [TRT] ModelImporter.cpp:777: ERROR: builtin_op_importers.cpp:1197 In function QuantDequantLinearHelper:
[6] Assertion failed: scaleAllPositive && "Scale coefficients must all be positive"
[11/06/2024-14:33:57] [E] Failed to parse onnx file
[11/06/2024-14:33:57] [I] Finished parsing network model. Parse time: 0.0595906
[11/06/2024-14:33:57] [E] Parsing model failed
[11/06/2024-14:33:57] [E] Failed to create engine from model or file.
[11/06/2024-14:33:57] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=model/resnet18int8head_1f/fastbev_pre_trt_ptq.onnx --fp16 --int8 --inputIOFormats=fp16:chw, --outputIOFormats=fp16:chw, --saveEngine=model/resnet18int8head_1f/build/fastbev_pre_trt_ptq.plan --memPoolSize=workspace:2048 --verbose --dumpLayerInfo --dumpProfile --separateProfileRun --profilingVerbosity=detailed --exportLayerInfo=model/resnet18int8head_1f/build/fastbev_pre_trt_ptq.json