Monday-Leo / YOLOv7_Tensorrt

A simple implementation of Tensorrt YOLOv7
108 stars 15 forks source link

无法生成 engine #2

Open GuHuiJian opened 2 years ago

GuHuiJian commented 2 years ago

使用export_onnx.py 文件生成了onnx文件,在使用trtexec 生成engine 文件时失败,环境为 cuda10.2 cudnn8.4.1 torch1.7.1 是否在生成onnx文件时需要将device 设置为GPU模式

Monday-Leo commented 2 years ago

不需要,用cpu导出onnx即可,提供一下完整的报错代码。

GuHuiJian commented 2 years ago

一下是运行时的输出结果,感谢您的回复

@.*** MINGW64 /e/yolov7/TensorRT-8.4.1.5.Windows10.x86_64.cuda-10.2.cudnn8.4/TensorRT-8.4.1.5/bin $ ./trtexec.exe --onnx=./yolov7.onnx --saveEngine=./yolov7_fp16.engine --fp16 --workspace=200 &&&& RUNNING TensorRT.trtexec [TensorRT v8401] # E:\yolov7\TensorRT-8.4.1.5.Windows10.x86_64.cuda-10.2.cudnn8.4\TensorRT-8.4.1.5\bin\trtexec.exe --onnx=./yolov7.onnx --saveEngine=./yolov7_fp16.engine --fp16 --workspace=200 [07/12/2022-16:36:22] [W] --workspace flag has been deprecated by --memPoolSize flag. [07/12/2022-16:36:22] [I] === Model Options === [07/12/2022-16:36:22] [I] Format: ONNX [07/12/2022-16:36:22] [I] Model: ./yolov7.onnx [07/12/2022-16:36:22] [I] Output: [07/12/2022-16:36:22] [I] === Build Options === [07/12/2022-16:36:22] [I] Max batch: explicit batch [07/12/2022-16:36:22] [I] Memory Pools: workspace: 200 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [07/12/2022-16:36:22] [I] minTiming: 1 [07/12/2022-16:36:22] [I] avgTiming: 8 [07/12/2022-16:36:22] [I] Precision: FP32+FP16 [07/12/2022-16:36:22] [I] LayerPrecisions: [07/12/2022-16:36:22] [I] Calibration: [07/12/2022-16:36:22] [I] Refit: Disabled [07/12/2022-16:36:22] [I] Sparsity: Disabled [07/12/2022-16:36:22] [I] Safe mode: Disabled [07/12/2022-16:36:22] [I] DirectIO mode: Disabled [07/12/2022-16:36:22] [I] Restricted mode: Disabled [07/12/2022-16:36:22] [I] Build only: Disabled [07/12/2022-16:36:22] [I] Save engine: ./yolov7_fp16.engine [07/12/2022-16:36:22] [I] Load engine: [07/12/2022-16:36:22] [I] Profiling verbosity: 0 [07/12/2022-16:36:22] [I] Tactic sources: Using default tactic sources [07/12/2022-16:36:22] [I] timingCacheMode: local [07/12/2022-16:36:22] [I] timingCacheFile: [07/12/2022-16:36:22] [I] Input(s)s format: fp32:CHW [07/12/2022-16:36:22] [I] Output(s)s format: fp32:CHW [07/12/2022-16:36:22] [I] Input build shapes: model [07/12/2022-16:36:22] [I] Input calibration shapes: model [07/12/2022-16:36:22] [I] === System Options === [07/12/2022-16:36:22] [I] Device: 0 [07/12/2022-16:36:22] [I] DLACore: [07/12/2022-16:36:22] [I] Plugins: [07/12/2022-16:36:22] [I] === Inference Options === [07/12/2022-16:36:22] [I] Batch: Explicit [07/12/2022-16:36:22] [I] Input inference shapes: model [07/12/2022-16:36:22] [I] Iterations: 10 [07/12/2022-16:36:22] [I] Duration: 3s (+ 200ms warm up) [07/12/2022-16:36:22] [I] Sleep time: 0ms [07/12/2022-16:36:22] [I] Idle time: 0ms [07/12/2022-16:36:22] [I] Streams: 1 [07/12/2022-16:36:22] [I] ExposeDMA: Disabled [07/12/2022-16:36:22] [I] Data transfers: Enabled [07/12/2022-16:36:22] [I] Spin-wait: Disabled [07/12/2022-16:36:22] [I] Multithreading: Disabled [07/12/2022-16:36:22] [I] CUDA Graph: Disabled [07/12/2022-16:36:22] [I] Separate profiling: Disabled [07/12/2022-16:36:22] [I] Time Deserialize: Disabled [07/12/2022-16:36:22] [I] Time Refit: Disabled [07/12/2022-16:36:22] [I] Inputs: [07/12/2022-16:36:22] [I] === Reporting Options === [07/12/2022-16:36:22] [I] Verbose: Disabled [07/12/2022-16:36:22] [I] Averages: 10 inferences [07/12/2022-16:36:22] [I] Percentile: 99 [07/12/2022-16:36:22] [I] Dump refittable layers:Disabled [07/12/2022-16:36:22] [I] Dump output: Disabled [07/12/2022-16:36:22] [I] Profile: Disabled [07/12/2022-16:36:22] [I] Export timing to JSON file: [07/12/2022-16:36:22] [I] Export output to JSON file: [07/12/2022-16:36:22] [I] Export profile to JSON file: [07/12/2022-16:36:22] [I] [07/12/2022-16:36:22] [I] === Device Information === [07/12/2022-16:36:22] [I] Selected Device: NVIDIA GeForce GTX 1660 Ti [07/12/2022-16:36:22] [I] Compute Capability: 7.5 [07/12/2022-16:36:22] [I] SMs: 24 [07/12/2022-16:36:22] [I] Compute Clock Rate: 1.77 GHz [07/12/2022-16:36:22] [I] Device Global Memory: 6143 MiB [07/12/2022-16:36:22] [I] Shared Memory per SM: 64 KiB [07/12/2022-16:36:22] [I] Memory Bus Width: 192 bits (ECC disabled) [07/12/2022-16:36:22] [I] Memory Clock Rate: 6.001 GHz [07/12/2022-16:36:22] [I] [07/12/2022-16:36:22] [I] TensorRT version: 8.4.1 Segmentation fault

------------------ 原始邮件 ------------------ 发件人: "Monday-Leo/YOLOv7_Tensorrt" @.>; 发送时间: 2022年7月12日(星期二) 下午5:32 @.>; @.**@.>; 主题: Re: [Monday-Leo/YOLOv7_Tensorrt] 无法生成 engine (Issue #2)

不需要,用cpu导出onnx即可,提供一下完整的报错代码。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Monday-Leo commented 2 years ago

这个报错我暂时还没有遇到过,是官方的模型,还是自己训练的模型?能否用netron可视化一下输入输出。

GuHuiJian commented 2 years ago

https://github.com/WongKinYiu/yolov7 在这个网站里面下载的yolov7.pt

Message ID: @.***>

从QQ邮箱发来的超大附件

yolov7.pt (72.09M, 2022年08月11日 18:08 到期)进入下载页面:http://mail.qq.com/cgi-bin/ftnExs_download?t=exs_ftn_download&k=5837363895944dc926b247081035011d5f07045d040502054b0e535d531807025f561b005207051f530f060a060c070b00530509363c334b095b594e011b4346660a&code=f7686532

Monday-Leo commented 2 years ago

你把转换好的onnx发我,我试试转话一下吧,定位一下问题。

Monday-Leo commented 2 years ago

QQ交流群:768071513,进群把模型发给我。

GuHuiJian commented 2 years ago

转换以后的yolov7.onnx,见附件

从QQ邮箱发来的超大附件

yolov7.onnx (140.92M, 2022年08月11日 21:55 到期)进入下载页面:http://mail.qq.com/cgi-bin/ftnExs_download?k=793530639459019479b041534266024c5f03035a565f540514560501574b0454580d1d015d03544e5b0d555a0704070500035257646d301a56595f1553485f0d574d305e&t=exs_ftn_download&code=950cdf0c

GuHuiJian commented 2 years ago

这个输出的num_dets 的维度是不是不对,是不是1*100?

Monday-Leo commented 2 years ago

我已经下载了模型,并且转换成功,也可以准确预测,不是ONNX模型的问题,应该是tensorrt环境问题。请不要在jupter notebook这些虚拟环境中运行,参考我的视频在CMD中输入运行命令。如果依旧报错,检查tensorrt环境。

GuHuiJian commented 2 years ago

这个输出没问题吧