Closed sunsunnyshine closed 1 year ago
When i used the provided yolov6s.onnx,it can work normally.Is there some problem in the process of transforming the to yolov7.onnx?But i just used the officially provided code.I'm confused and will be so appreciate if you could give me an advice.
when using yolov6: /File does not exist : ../../data/yolov6s.onnx-kFLOAT-batch1.engine [03/25/2023-11:05:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +84, GPU +0, now: CPU 7965, GPU 991 (MiB) [03/25/2023-11:05:08] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +121, GPU +22, now: CPU 8835, GPU 1013 (MiB) [03/25/2023-11:05:08] [I] Parsing ONNX file: ../../data/yolov6s.onnx [03/25/2023-11:05:08] [I] [TRT] ---------------------------------------------------------------- [03/25/2023-11:05:08] [I] [TRT] Input filename: ../../data/yolov6s.onnx [03/25/2023-11:05:08] [I] [TRT] ONNX IR version: 0.0.6 [03/25/2023-11:05:08] [I] [TRT] Opset version: 12 [03/25/2023-11:05:08] [I] [TRT] Producer name: pytorch [03/25/2023-11:05:08] [I] [TRT] Producer version: 1.8 [03/25/2023-11:05:08] [I] [TRT] Domain: [03/25/2023-11:05:08] [I] [TRT] Model version: 0 [03/25/2023-11:05:08] [I] [TRT] Doc string: [03/25/2023-11:05:08] [I] [TRT] ---------------------------------------------------------------- [03/25/2023-11:05:08] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [03/25/2023-11:05:08] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped [03/25/2023-11:05:08] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped [03/25/2023-11:05:08] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped workspaceSize = 8589672448, dlaManagedSRAMSize = 0, dlaLocalDRAMSize = 1073741824, dlaGlobalDRAMSize = 536870912 [03/25/2023-11:05:08] [I] Building TensorRT engine: ../../data/yolov6s.onnx-kFLOAT-batch1.engine [03/25/2023-11:05:09] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +15, GPU +12, now: CPU 8574, GPU 1025 (MiB) [03/25/2023-11:05:09] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +4, GPU +8, now: CPU 8578, GPU 1033 (MiB) [03/25/2023-11:05:09] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [03/25/2023-11:05:28] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes. [03/25/2023-11:06:42] [I] [TRT] Total Activation Memory: 4413468672 [03/25/2023-11:06:42] [I] [TRT] Detected 1 inputs and 4 output network tensors. [03/25/2023-11:06:43] [I] [TRT] Total Host Persistent Memory: 86416 [03/25/2023-11:06:43] [I] [TRT] Total Device Persistent Memory: 983552 [03/25/2023-11:06:43] [I] [TRT] Total Scratch Memory: 2048000 [03/25/2023-11:06:43] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 30 MiB, GPU 2209 MiB [03/25/2023-11:06:43] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 103 steps to complete. [03/25/2023-11:06:43] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 6.618ms to assign 9 blocks to 103 nodes requiring 24377856 bytes. [03/25/2023-11:06:43] [I] [TRT] Total Activation Memory: 24377856 [03/25/2023-11:06:43] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +6, GPU +102, now: CPU 6, GPU 102 (MiB) [03/25/2023-11:06:43] [I] [TRT] Loaded engine size: 102 MiB [03/25/2023-11:06:43] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +102, now: CPU 0, GPU 102 (MiB) [03/25/2023-11:06:43] [I] TRT Engine file saved to: ../../data/yolov6s.onnx-kFLOAT-batch1.engine 4 Bindings: 2 0: name: image_arrays, size: 1x3x640x640 1: name: outputs, size: 1x8400x85 hasImplicitBatchDimension: 0, mBatchSize = 0
I've figured out.The reason is dimension becomes unknown when I use NMS before transforming to ONNX.I chose to delete the NMS in the model, transform to ONNX and use onnx_graphsurgeon adding NMS.
The specific method is as follows:
python --weights --grid --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640 --include-nms
But i'm still confused about the reason why NMS didn't work in model.
I'm making export with same command:
python --weights --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --include-nms
And a part of output:
Note: Producer node(s) of first tensor:
[EfficientNMS_TRT_358 (EfficientNMS_TRT)
Inputs: [
Variable (TRT::EfficientNMS_TRT_602): (shape=[1, 25200, 4], dtype=float32)
Variable (TRT::EfficientNMS_TRT_613): (shape=[1, 25200, 13], dtype=float32)
Outputs: [
Variable (num_dets): (shape=None, dtype=int32)
Variable (det_boxes): (shape=None, dtype=float32)
Variable (det_scores): (shape=None, dtype=float32)
Variable (det_classes): (shape=None, dtype=int32)
Attributes: OrderedDict([('background_class', [-1]), ('box_coding', [1]), ('iou_threshold', 0.6499999761581421), ('max_output_boxes', 100), ('plugin_version', '1'), ('score_activation', 0), ('score_threshold', 0.3499999940395355)])
Domain: TRT]
And the result model has output tensors:
** Bindings: 5 **
0: name: images, size: 1x3x640x640
1: name: num_dets, size: 1x1
2: name: det_boxes, size: 1x100x4
3: name: det_scores, size: 1x100
4: name: det_classes, size: 1x100
Thanks! I think it should be related to third-party library version or GPU version.Other schoolmate tried and got the same error.Just used the Official code provided by Yolov7.
oh!I remember that there are some errors when exporting.
CoreML export failure: Core ML only supports tensors with rank <= 5. Layer "model.105.anchor_grid", with type "const", outputs a rank 6 tensor.
I'm also using an official yolov7 repository. CoreML part don't used in onnx export. Do you have onnx_graphsurgeon package installed? It used in export:
Hi!I check my conda envs pip list.Onnx_graphsurgeon has already installed.
Package Version
absl-py 1.4.0 asttokens 2.2.1 backcall 0.2.0 cachetools 5.3.0 certifi 2022.12.7 charset-normalizer 3.1.0 cmake 3.26.0 coloredlogs 15.0.1 contourpy 1.0.7 coremltools 6.2 cycler 0.11.0 decorator 5.1.1 executing 1.2.0 filelock 3.10.0 flatbuffers 23.3.3 fonttools 4.39.2 google-auth 2.16.2 google-auth-oauthlib 0.4.6 grpcio 1.51.3 humanfriendly 10.0 idna 3.4 ipython 8.11.0 jedi 0.18.2 Jinja2 3.1.2 kiwisolver 1.4.4 lit 15.0.7 Markdown 3.4.1 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 mdurl 0.1.2 mpmath 1.3.0 networkx 3.0 numpy 1.23.5 nvidia-cublas-cu11 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 nvidia-cufft-cu11 nvidia-curand-cu11 nvidia-cusolver-cu11 nvidia-cusparse-cu11 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 nvidia-pyindex 1.0.9 oauthlib 3.2.2 onnx 1.13.1 onnx-graphsurgeon 0.3.26 onnx-simplifier 0.4.17 onnxruntime 1.14.1 opencv-python packaging 23.0 pandas 1.5.3 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.4.0 pip 23.0.1 prompt-toolkit 3.0.38 protobuf 3.20.3 psutil 5.9.4 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.4.8 pyasn1-modules 0.2.8 Pygments 2.14.0 pyparsing 3.0.9 python-dateutil 2.8.2 pytz 2022.7.1 PyYAML 6.0 requests 2.28.2 requests-oauthlib 1.3.1 rich 13.3.2 rsa 4.9 scipy 1.10.1 seaborn 0.12.2 setuptools 65.6.3 six 1.16.0 stack-data 0.6.2 sympy 1.11.1 tensorboard 2.12.0 tensorboard-data-server 0.7.0 tensorboard-plugin-wit 1.8.1 thop 0.1.1.post2209072238 torch 2.0.0 torchvision 0.15.1 tqdm 4.65.0 traitlets 5.9.0 triton 2.0.0 typing_extensions 4.5.0 urllib3 1.26.15 wcwidth 0.2.6 Werkzeug 2.2.3 wheel 0.38.4
Thanks for help!Since that I have solved this problem with other solutions, you don't have to spend to much time thinking about this mysterious problem. Hahaha
I've figured out. In yolov7 code when i use --max_wh,model uses the ONNX_ORT(nnx module with ONNX-Runtime NMS operation.) I should remove the --max_wh in the command line!
File does not exist : ../../data/yolov7.onnx-kFLOAT-batch1.engine [03/25/2023-10:48:58] [I] [TRT] [MemUsageChange] Init CUDA: CPU +98, GPU +0, now: CPU 8650, GPU 991 (MiB) [03/25/2023-10:49:02] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +139, GPU +22, now: CPU 9264, GPU 1013 (MiB) [03/25/2023-10:49:02] [I] Parsing ONNX file: ../../data/yolov7.onnx [03/25/2023-10:49:02] [I] [TRT] ---------------------------------------------------------------- [03/25/2023-10:49:02] [I] [TRT] Input filename: ../../data/yolov7.onnx [03/25/2023-10:49:02] [I] [TRT] ONNX IR version: 0.0.7 [03/25/2023-10:49:02] [I] [TRT] Opset version: 12 [03/25/2023-10:49:02] [I] [TRT] Producer name: pytorch [03/25/2023-10:49:02] [I] [TRT] Producer version: 2.0.0 [03/25/2023-10:49:02] [I] [TRT] Domain: [03/25/2023-10:49:02] [I] [TRT] Model version: 0 [03/25/2023-10:49:02] [I] [TRT] Doc string: [03/25/2023-10:49:02] [I] [TRT] ---------------------------------------------------------------- [03/25/2023-10:49:03] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [03/25/2023-10:49:03] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped [03/25/2023-10:49:03] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output. workspaceSize = 8589672448, dlaManagedSRAMSize = 0, dlaLocalDRAMSize = 1073741824, dlaGlobalDRAMSize = 536870912 [03/25/2023-10:49:03] [I] Building TensorRT engine: ../../data/yolov7.onnx-kFLOAT-batch1.engine [03/25/2023-10:49:03] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +10, GPU +12, now: CPU 9090, GPU 1025 (MiB) [03/25/2023-10:49:03] [I] [TRT] [MemUsageChange] Init cuDNN: CPU -1, GPU +8, now: CPU 9089, GPU 1033 (MiB) [03/25/2023-10:49:03] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [03/25/2023-10:50:10] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes. [03/25/2023-10:51:49] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called. [03/25/2023-10:53:10] [I] [TRT] Total Activation Memory: 5057725440 [03/25/2023-10:53:10] [I] [TRT] Detected 1 inputs and 1 output network tensors. [03/25/2023-10:53:10] [W] [TRT] Profile kMAX values are not self-consistent. Assertion profile != nullptr failed. need profile [03/25/2023-10:53:10] [E] [TRT] 4: [memoryComputation.cpp::nvinfer1::builder::computeEngineAuxMemorySizes::203] Error Code 4: Internal Error (Profile kOPT values are not self-consistent. Assertion profile != nullptr failed. need profile) [03/25/2023-10:53:11] [E] [TRT] 2: [builder.cpp::nvinfer1::builder::Builder::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )