WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.99k stars 521 forks source link

Low FPS on jetson type devices #24

Open MsWik opened 3 years ago

MsWik commented 3 years ago

Hello. Thanks for your work. When testing yolor-ssss-dwt 640 on devices like jetson Xavier NX, an unsatisfactory result was obtained in terms of performance (about 30 frames per second), yolo4-tiny + tkDNN FP16 640 * 640 ~ 100 fps. Are there ways to speed up the output for end devices? At 2070S ~ 100 FPS

WongKinYiu commented 3 years ago

I think yolor-ssss-dwt + tkDNN wont't have only 30 fps, since in my experiments yolov4-s is far faster than 30fps on xavier nx. Do you has the fps results of yolor-ssss-s2d?

MsWik commented 3 years ago

Thanks for the answer. No, the result is the same. About 27FPS when processing a file. Tell me what version of the torch you have? I have torch 1.8.0 CUDA: 0 (Xavier, 7765MB) and torchvision 0.9.0. FP16 is included. I haven't used tkDNN for yolor-ssss-dwt, can you have an example?

WongKinYiu commented 3 years ago

I used pure tensorrt without tkdnn. https://github.com/linghu8812/tensorrt_inference/tree/master/ScaledYOLOv4

MsWik commented 3 years ago

Thanks for the answer. I managed to convert the model to onnx, however, I have not yet managed to collect and draw an output through tensorrt. Can you tell us how you made the output in tensorrt?

MsWik commented 3 years ago

I ran the model through onnx_tensorrt, but the speed remains the same.