AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.57k stars 7.95k forks source link

RTX 2080 ti #1822

Open ryumansang opened 5 years ago

ryumansang commented 5 years ago

Nice to meet you.

I bought RTX 2080Ti today. By default, install all required items in the same environment as V100.

This setting has been tested. NVIDIA Driver 410.48 CUDA 10.0 & CuDN 7.3.0

Is this the recommended setting?

AlexeyAB commented 5 years ago

@ryumansang Hi,

NVIDIA Driver 410.48 CUDA 10.0 & CuDN 7.3.0

Yes, it should be enough.

So you can compile with GPU=1 CUDNN=1 CUDNN_HALF=0 OPENCV=1 or with GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1 in the Makefile.

ryumansang commented 5 years ago

Thank you.

I have one other question.

I currently have four configurations: gtx1060, gx1080 ti, tesla v100, and rtx2080 ti. When tested using data/dog.jpg and yolov3

gx1060 is 0.036 sec -> cuda 8.0 gx1080 ti is 0.033 sec -> cuda 8.0 tesla v100 is 0.034 sec (tensor 0.014 sec) -> cuda 9.1 & nvidia 396.51 driver rtx2080 ti is 0.019 sec(tensor 0.014 sec) -> cuda 10 & the latest nvidia driver

The detection speed is shown above.

I suspect that the speed of testla v100 is not too slow.

Are the specifications normal?

AlexeyAB commented 5 years ago

Are the specifications normal?

Yes. I got 0.031 sec with CUDNN_HALF=0, and 0.011 sec with CUDNN_HALF=1 on Tesla V100: https://github.com/AlexeyAB/darknet/issues/407

Also you can try to comment this line: https://github.com/AlexeyAB/darknet/blob/31df5e27356b6b11ffd43baace9afdd3800a8aa2/src/convolutional_layer.c#L164 And test with CUDNN_HALF=0 on rtx2080 ti - it will forcibly disable Tensor Cores at all.

Bucause with this line the Tensor Cores can still be used internally in cuDNN even with CUDNN_HALF=0 for Float 32-bit, by using atomatic conversion FP32->FP16->FP32. So may be 0.019 sec is on Tensor Cores too.


Also you can try to use INT8 on Tensor Cores with only - 1-2% mAP decreasing for Detection by using this repo: https://github.com/AlexeyAB/yolo2_light

ryumansang commented 5 years ago

Thank you! Let me test it

alexanderfrey commented 5 years ago

What is your frame rate on a video with the 2080ti and CUDNN_HALF=1 ?

thanks !

nirbenz commented 5 years ago

I tried compiling with CUDNN_HALF=1 for an 2080ti setup but am getting a very slim speedup (I'd say 15%). Far from the 2.5x claimed so I'm obviously doing something wrong. Is compiling with OpenCV required for this (I don't see why it would be but asking anyway).

Testing with the Python implementation. Perhaps I need to explicitly cast to fp16 before calling python's detect method?

AlexeyAB commented 5 years ago

@nirbenz

Testing with the Python implementation. Perhaps I need to explicitly cast to fp16 before calling python's detect method?

No.

nirbenz commented 5 years ago

I simply call darknet.py's detect method on an image read via Darknet's load_image method. I'm only timing the actual function call to get_network_boxes as to minimize possible Python overhead. Still the speed increase is a lot smaller.

EDIT: Unless I'm missing something; no changes are required to cfg file, weights, etc. - only compilation with the HALF flag, right? I carefully read through relevant issues and found no such requirements but making sure.

AlexeyAB commented 5 years ago

@nirbenz

I simply call darknet.py's detect method on an image read via Darknet's load_image method. I'm only timing the actual function call to get_network_boxes as to minimize possible Python overhead. Still the speed increase is a lot smaller.

  1. get_network_boxes() isn't a function of Neural network inference. This is the network inference function: predict_image(net, im): https://github.com/AlexeyAB/darknet/blob/21a4ec9390b61c0baa7ef72e72e59fa143daba4c/darknet.py#L238 actually it calls this C function that can have some overheads: https://github.com/AlexeyAB/darknet/blob/21a4ec9390b61c0baa7ef72e72e59fa143daba4c/src/network.c#L652-L660

  2. darknet.py has overheads and current python code may be a bottleneck on high perfomrance GPU

You shouldn't change anything in the source code or in cfg-file to use Tensor Cores on Geforce RTX 2080Ti, just set GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 in the Makefile.

nirbenz commented 5 years ago
  1. Why is OPENCV=1 required?
  1. get_network_boxes() isn't a function of Neural network inference. This is the network inference function: predict_image(net, im):

This is what I meant, actually. I wrap the measured timing around this method (as well as subsequent ones). The image is already resized to correct size to avoid possible slowdown due to image resizing and letterboxing. The goal is to only measure the actual inference call (and that's how I do it).

AlexeyAB commented 5 years ago

@nirbenz

  1. OPENCV=1 is required only for Training, so data augmentation will not be a bottleneck for GPU with Tensor Cores. In your case OPENCV=1 isn't required, because in your case OpenCV library is used here if you un-comment these lines: https://github.com/AlexeyAB/darknet/blob/21a4ec9390b61c0baa7ef72e72e59fa143daba4c/darknet.py#L227-L229

  2. What FPS can you get by using ./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights test.mp4 with CUDNN_HALF=1 and with CUDNN_HALF=0?

nirbenz commented 5 years ago

@AlexeyAB I will check and get back to you. Thanks a lot.

sleeplessai commented 4 years ago

@AlexeyAB How long it takes to train YOLOv4 on COCO dataset with RTX2080 Ti and V100 respectively?

AimplainLeo commented 3 years ago

@AlexeyAB Thank you very much for your repositories. Could you answer for me short questions below:

  1. Can i training with tensor cores GPU and running trained-weights with no-tensor core supported GPU ? ( I only want to reduce training time ).

  2. I use my old workstation for training image with quadro K5000 GPU, which is very weak GPU for training yolo. I am going to buy GTX 1080ti or RTX 2080 (not Ti). Should i buy GTX 1080Ti (old) or RTX 2080 ?

Thanks again, sorry if not clear or duplicate question!