Open leayz-888 opened 2 years ago
I meet the problem too. torch pt model score is very high, but onnx model score is very slow.
Same issue here, trained yolov7 on a custom dataset. torch model returns scores above 0.9 but trt converted model returns scores around 0.4 - 0.5 for the same image. Tested trt convertion from trtexec, Linaom1214/tensorrt-python and triple-Mu/YOLO-TensorRT8, both on fp32 and fp16
Torch model:
TRT model:
Same issue here, trained yolov7 on a custom dataset. torch model returns scores above 0.9 but trt converted model returns scores around 0.4 - 0.5 for the same image. Tested trt convertion from trtexec, Linaom1214/tensorrt-python and triple-Mu/YOLO-TensorRT8, both on fp32 and fp16
Torch model:
TRT model:
I met the same problem with you.
Train (cfg/training/) -> Reparameterization (cfg/deploy) -> Convert.
Maybe you miss the reparameterization step.
Maybe you miss the reparameterization step.
Thanks for your reply, can you tell me exactly how to do the reparameterization?
Thank you for your reply! Just tested reparametrization (The only difference between traning/yolov7.yaml and deploy/yolov7.yaml is that on the last line, IDetect is used instead of Detect), same results, reconverted the model to onnx and trt. In a few hours i'm going to try running the model in ONNX runtime to check if the model is failing on the ONNX conversion or the trt conversion
Thank you, I followed your instructions to re-parameterize the trained yolov7.pt and convert it to trt format. I tested it and got some results:
Thank you for your reply! Just tested reparametrization (The only difference between traning/yolov7.yaml and deploy/yolov7.yaml is that on the last line, IDetect is used instead of Detect), same results, reconverted the model to onnx and trt. In a few hours i'm going to try running the model in ONNX runtime to check if the model is failing on the ONNX conversion or the trt conversion
After re-parameterization, there is still a big difference between the inference results of the trt model and the pt model.
Important for Reparametization is to change your nc (number of classes inside the reparametization file) accordingly to your custom dataset. That can cause weird issues while inferencing if not done correctly. Also its important to use opset 11 when exporting the model at least for me that was the only opset version which worked
Important for Reparametization is to change your nc (number of classes inside the reparametization file) accordingly to your custom dataset. That can cause weird issues while inferencing if not done correctly. Also its important to use opset 11 when exporting the model at least for me that was the only opset version which worked
Important for Reparametization is to change your nc (number of classes inside the reparametization file) accordingly to your custom dataset. That can cause weird issues while inferencing if not done correctly. Also its important to use opset 11 when exporting the model at least for me that was the only opset version which worked
Thanks, I have successfully reparameterized, and the inference results of the model before and after the reparameterization are the same. On my own data set, I converted the reparameterized model into trt, the inference results of trt and the inference results of pt model are very different, not only the confidence of the target has decreased, but even a lot of detection wrong target.
I also had alot weird problems with converting my trained model. My not so efficient but maybe effective solution would be to try every conversion out. By that i mean alot different exporting commands. For example adding your batch-size or simplify model and add your img-size etc to the exporting process. Also different opset versions. After alot trying around i finally found out what i need to do for my Opencv DNN Inference so maybe that experimental approach could work for you too. Otherwise i can only imagine that the TensorRT conversion from .onnx to .trt messes up somewhere. And to check that i recommend checking the supported .onnx models for the .trt Engine conversion.
mple adding your batch-size or simplify model and add your img-size etc to the exporting process.
ok,thanks for the suggestion, i'll try it, by the way, what's the conversion command you're using?
I mean since its not for tensort but for Opencv DNN i guess it wont help but simply: export.py --weights [myweights].pt --img [myimagesize] --include onnx --opset 11 https://github.com/WongKinYiu/yolov7/tree/u5 this branch was used for export. I read in a different thread that this branch is required
Update: So i managed to run the model with ONNX runtime, same issue, so the problem isn't the TRT conversion, should be how the model is exported into ONNX
When you tried out my command and its not working im clueless tbh maybe somebody professional can help here
Update: So i managed to run the model with ONNX runtime, same issue, so the problem isn't the TRT conversion, should be how the model is exported into ONNX I may have found the reason:
- When using the pt model for inference, the script detect.py is used, the --rect parameter is set to True, and the image is filled with a rectangle to speed up inference, instead of (640,640) in the onnx and trt models;
- When using the trt model for inference, the data preprocessing step converts the BGR format of the image into RBG format for inference, but does not convert the image into RGB format during the inference of the pt model, which will cause differences in inference results . Now the inference results of my pt model and trt model are basically the same, I hope this will help you.
Update: So i managed to run the model with ONNX runtime, same issue, so the problem isn't the TRT conversion, should be how the model is exported into ONNX I may have found the reason:
- When using the pt model for inference, the script detect.py is used, the --rect parameter is set to True, and the image is filled with a rectangle to speed up inference, instead of (640,640) in the onnx and trt models;
- When using the trt model for inference, the data preprocessing step converts the BGR format of the image into RBG format for inference, but does not convert the image into RGB format during the inference of the pt model, which will cause differences in inference results . Now the inference results of my pt model and trt model are basically the same, I hope this will help you. I have the same problem, trt inference score much lower,but boxes is same as pt. I founf your 2. is wrong , pt model inference also have BGR2GRGB in def letterbox. Can tell me how to solve bring the trt score same as pt score, Thank you very much!!!
Update: So i managed to run the model with ONNX runtime, same issue, so the problem isn't the TRT conversion, should be how the model is exported into ONNX I may have found the reason:
- When using the pt model for inference, the script detect.py is used, the --rect parameter is set to True, and the image is filled with a rectangle to speed up inference, instead of (640,640) in the onnx and trt models;
- When using the trt model for inference, the data preprocessing step converts the BGR format of the image into RBG format for inference, but does not convert the image into RGB format during the inference of the pt model, which will cause differences in inference results . Now the inference results of my pt model and trt model are basically the same, I hope this will help you.
Can you tell me how to solve bring the onnx score same as pt score, Thank you very much!!!
I mean since its not for tensort but for Opencv DNN i guess it wont help but simply: export.py --weights [myweights].pt --img [myimagesize] --include onnx --opset 11 https://github.com/WongKinYiu/yolov7/tree/u5 this branch was used for export. I read in a different thread that this branch is required
I am unable to load the onnx modle using this command with opencv dnn, What did you use?
Update: So i managed to run the model with ONNX runtime, same issue, so the problem isn't the TRT conversion, should be how the model is exported into ONNX I may have found the reason:
- When using the pt model for inference, the script detect.py is used, the --rect parameter is set to True, and the image is filled with a rectangle to speed up inference, instead of (640,640) in the onnx and trt models;
- When using the trt model for inference, the data preprocessing step converts the BGR format of the image into RBG format for inference, but does not convert the image into RGB format during the inference of the pt model, which will cause differences in inference results . Now the inference results of my pt model and trt model are basically the same, I hope this will help you.
Could you please provide step by step instructions on how you did it? From what i understood from your explanation, you pre processed the image, switching from BGR channels to RBG channels? Tried that, same issue
I also encounter this problem too. I think that the problem may not come from reparamerization since current export.py is already include that process. May I ask that are you guys working on single class detection?
It might related to this pull request
https://github.com/WongKinYiu/yolov7/pull/305
which refer to https://github.com/WongKinYiu/yolov7/blob/44d8ab41780e24eba563b6794371f29db0902271/utils/general.py#L649
and exported onnxruntime NMS in https://github.com/WongKinYiu/yolov7/blob/44d8ab41780e24eba563b6794371f29db0902271/models/experimental.py#L176
and also TRT NMS as well in https://github.com/WongKinYiu/yolov7/blob/44d8ab41780e24eba563b6794371f29db0902271/models/experimental.py#L208
Updated: Confirm that two lines in nms are related to incorrect probability in exported onnx and tensorRT. After editd those two line by using patch from DMLON, the output result are now consistence!
I also encounter this problem too. I think that the problem may not come from reparamerization since current export.py is already include that process. May I ask that are you guys working on single class detection?
It might related to this pull request
305
which refer to
and exported onnxruntime NMS in
and also TRT NMS as well in
Interesting finds, tested one of the models that have multiple classes and got better inferences (Some still have differences 0.92 vs 0.733). Seems that my single class model does not work due to that pull request? I'll test later changing those lines to see if we have better results
EDIT: Holy crap, that was it, ONNX model works like a charm for single class models, had to replace how the end2end function works by passing number of classes as parameter then integrated the same function as specified in
Thanks a lot!! Making a pull request
The -simplify
option in the export process messed up all my metrics. Exporting without simplify fixed it for me, took me forever to find the reason.
thanks for your great work I used the pre-trained model yolov7.pt you provided and the conversion code you provided to successfully convert the pt format model into a trt model. The inference results of the pt model and the trt model are consistent. However, when I used yolov7 to train my own dataset, I also converted the saved pt model to trt format. I tested and found that the inference results of the pt model and the inference results of trt were very different. May I ask why this is? The training command I use is: python -m torch.distributed.launch --nproc_per_node 2 --master_port 9527 train.py --workers 8 --device 0,1 --sync-bn --batch-size 48 --data data/dataset.yaml -- img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml My instructions for converting pt model to onnx model and trt model are: python export.py --weights runs/train/yolov7/best.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 python export.py -o home/yolov7/runs/train/yolov7/best.onnx -e yolov7-ours-nms.trt -p fp16