Hi ,

I am training on multi gpu(2 gpus) with a pretrained weight from single gpu on the same architecture (stopped single gpu training and restarted multi gpu with best weight)getting zero map values .Same architecture is working on single gpu . I am using two nvidia Testla 100

Command : /darknet detector train /home/Bnglr_ml/yolo_17june/darknet-master/data/obj_yolo3_cp.data /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/yolo_4_activation.cfg /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/model/yolov4_batch_arch__SingleGPU_107000.weights -map -dont_show -gpus 0,1

inference using below command : ./darknet detector map /home/Bnglr_ml/yolo_17june/darknet-master/data/obj_yolo3_cp.data /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/yolo_4_activation.cfg /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/yolo_4_activation.cfg /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/model/multigpu_nan/yolo_4_activation_best.weights CUDA-version: 10010 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.2.0

Result: for conf_thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan for conf_thresh = 0.25, TP = 0, FP = 0, FN = 185526, average IoU = 0.00 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.000000, or 0.00 %

config file (Same is used for single and multiple GPUS): chart_yolo_4_activation

batch=80 subdivisions=32 width=1024 height=1024 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1 learning_rate=0.00013

learning_rate =0.00005

burn_in =1000

burn_in=1000 max_batches = 500500 policy=steps steps=400000,450000 scales=.1,.1 mosaic=1

Please can let me know if any solution or chnages required on this .

AlexeyAB / darknet

darknet multigpu training gives map zero and precision = -nan, recall = 0.00, F1-score = -nan #6609

learning_rate =0.00005

burn_in =1000