WongKinYiu / ScaledYOLOv4

Scaled-YOLOv4: Scaling Cross Stage Partial Network
GNU General Public License v3.0
2.02k stars 570 forks source link

Why is scaled yolov4 slower (training) than darkent-yolov4 ? #133

Open saikrishnadas opened 3 years ago

saikrishnadas commented 3 years ago

I used yolo-CSP with 512 input and 16 batch size and found that is slower than darknet-yolov4 with the same 512 and 16 batches. I guess scaled yolov4 is claimed to be 6 times faster than darknet-yolov4. Why? @WongKinYiu

WongKinYiu commented 3 years ago

Could you provide your cfg, command, and gpu for training on pytorch and darknet? And why you guess scaled yolov4 is claimed to be 6 times faster than darknet-yolov4?

saikrishnadas commented 3 years ago

yolo-csp cfg : yolov4-csp.zip

command : python train.py --img 512 --batch16 --epochs 5000 --data '../data.yaml' --cfg ./models/yolov4-csp.yaml --weights'' --name yolov4-csp-results --cache

GPU for training: Nvidia Tesla T4

And why you guess scaled yolov4 is claimed to be 6 times faster than darknet-yolov4? https://blog.roboflow.com/scaled-yolov4-tops-efficientdet/ speeding up training time 10x relative to Darknet

@WongKinYiu

saikrishnadas commented 3 years ago

@WongKinYiu Any solution?

WongKinYiu commented 3 years ago

check the memory usage, try to increase batch size if it could.

the reason why they said 10x is due to at the time:

  1. fp16 vs fp 32: gpu with tensore core, ~2 times
  2. fp16 vs fp32: batch size x2, ~2 times
  3. jitter vs multi-scale: x/1.5 resolution, ~2.25 times

2x2x2.25 = 9 9 + new cudnn speed up, ~10

saikrishnadas commented 3 years ago

But When I try increasing the batch size or switching to multi-GPU. My GPUs run out of memory.

saikrishnadas commented 3 years ago

@WongKinYiu ??

saikrishnadas commented 3 years ago

@WongKinYiu

WongKinYiu commented 3 years ago

If your gpu do not has tensor core, darknet is faster.