Closed samux87 closed 4 years ago
I modified the burn_in to 1000 and I have the training graph stopped 992 iteration with this log:
Tensor Cores are disabled until the first 3000 iterations are reached. (next mAP calculation at 1000 iterations) 1000: 4.417944, 4.601261 avg loss, 0.004000 rate, 9.786821 seconds, 256000 images, 2.730127 hours left Resizing to initial size: 640 x 768 try to allocate additional workspace_size = 267.51 MB CUDA allocate done! try to allocate additional workspace_size = 267.51 MB CUDA allocate done! try to allocate additional workspace_size = 267.51 MB CUDA allocate done! try to allocate additional workspace_size = 267.51 MB CUDA allocate done!
calculation mAP (mean average precision)... Detection layer: 139 - type = 27 Detection layer: 150 - type = 27 Detection layer: 161 - type = 27 4 cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 471 : build time: Jul 30 2020 - 09:31:34
cuDNN Error: CUDNN_STATUS_BAD_PARAM cuDNN Error: CUDNN_STATUS_BAD_PARAM: File exists darknet: ./src/utils.c:326: error: Assertion `0' failed. Aborted (core dumped)
Tensor Cores are disabled until the first 3000 iterations are reached.
10: -nan, -nan avg loss, 0.000000 rate, 16.171022 seconds, 640 images, 458.516231 hours left Resizing, random_coef = 1.40
704 x 704 try to allocate additional workspace_size = 80.06 MB CUDA allocate done! Loaded: 0.869695 seconds
I meet the same quesition with you,but i meet nan
Hi guys,
the training stop at the burn_in iteration during a 4 V100 GPU training; is this normal?
Here some configuration:
learning_rate=0.001 burn_in=800 max_batches = 6000 policy=steps steps=4800,5400 scales=.1,.1
cutmix=1
mosaic=1
Thank you, S.