WongKinYiu / ScaledYOLOv4

Scaled-YOLOv4: Scaling Cross Stage Partial Network
GNU General Public License v3.0
2.02k stars 574 forks source link

Questions on pre-trained and performance comparison #99

Open rafale77 opened 3 years ago

rafale77 commented 3 years ago

First of all, congratulations on publishing such fascinating and fantastic work.

I am currently using your YoloV4 pytorch u5 large pretrained model in my home automation setup.

  1. Reading through the paper and your repo, I am seeing that the "CSP" branch is based on darknet but that the "large" branch is based on u5 but also has a CSP base yaml file. However it does not appear to have pretrained weights associated unlike P5/P6/P7. Is it missing?
  2. It also looks like the CSP base in the large branch is identical to the yolov4l-mish in the YoloV4 repo u5 branch?
  3. Looking at the data, they are comparing performance with also different input size which is a bit confusing as it is hard to distinguish between the improvements and speed coming from the increase of input size and the scaling of the model. How would the evaluation and FPS performance be with a 672 size?
WongKinYiu commented 3 years ago
  1. no, i did not train yolov4-csp with setting of yolov4-large.
  2. yes, the models are same.
  3. we propose to compound scaling input resolution and stage of the network.
rafale77 commented 3 years ago

Thank you. I have replaced the YoloV4l-mish in my implementation with YoloV4-P5. The improvement in accuracy is very noticeable in my application: Live streaming from a camera looking at the score % of cars, they are much higher even under much more challenging lighting conditions. I can almost answer my own question: Knowing that I run at fixed FPS instead of loading my CPU/GPU to the maximum, I am looking at CPU/GPU(RTX3070) load. The increase in GPU load Vs. CSP is ~10%m CPU is increased by 15%. I am puzzled by how heavily loaded the CPU is in general. No significant increase in memory usage. I have since optimized the implementation slightly by for example swapping color channels after passing the image tensor to the GPU instead of doing it before, and reducing the input image size. Any reason why the enlargement of the model run in the GPU increases the CPU load this much?

I also had to disable the "auto" option in the letterbox function as it was occasionally yielding tensor size errors.

xuezu29 commented 3 years ago

I trained with my own data set but it was very very very slow. I used the YoloV4-P5 model, input size: 1376 , GPU:RTX-2080TI x 10, batch size:20. whether it is normal or not ?