AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.77k stars 7.96k forks source link

burn_in issue during multi GPU training #6378

Closed samux87 closed 4 years ago

samux87 commented 4 years ago

Hi guys,

the training stop at the burn_in iteration during a 4 V100 GPU training; is this normal?

Here some configuration:

learning_rate=0.001 burn_in=800 max_batches = 6000 policy=steps steps=4800,5400 scales=.1,.1

cutmix=1

mosaic=1

Thank you, S.

samux87 commented 4 years ago

I modified the burn_in to 1000 and I have the training graph stopped 992 iteration with this log:

Tensor Cores are disabled until the first 3000 iterations are reached. (next mAP calculation at 1000 iterations) 1000: 4.417944, 4.601261 avg loss, 0.004000 rate, 9.786821 seconds, 256000 images, 2.730127 hours left Resizing to initial size: 640 x 768 try to allocate additional workspace_size = 267.51 MB CUDA allocate done! try to allocate additional workspace_size = 267.51 MB CUDA allocate done! try to allocate additional workspace_size = 267.51 MB CUDA allocate done! try to allocate additional workspace_size = 267.51 MB CUDA allocate done!

calculation mAP (mean average precision)... Detection layer: 139 - type = 27 Detection layer: 150 - type = 27 Detection layer: 161 - type = 27 4 cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 471 : build time: Jul 30 2020 - 09:31:34

cuDNN Error: CUDNN_STATUS_BAD_PARAM cuDNN Error: CUDNN_STATUS_BAD_PARAM: File exists darknet: ./src/utils.c:326: error: Assertion `0' failed. Aborted (core dumped)

xiaobumiDM commented 4 years ago

Tensor Cores are disabled until the first 3000 iterations are reached.

10: -nan, -nan avg loss, 0.000000 rate, 16.171022 seconds, 640 images, 458.516231 hours left Resizing, random_coef = 1.40

704 x 704 try to allocate additional workspace_size = 80.06 MB CUDA allocate done! Loaded: 0.869695 seconds

I meet the same quesition with you,but i meet nan