I am training on multi gpu(2 gpus) with a pretrained weight from single gpu on the same architecture (stopped
single gpu training and restarted multi gpu with best weight)getting zero map values .Same architecture is working on single gpu . I am using two nvidia Testla 100
Hi ,
I am training on multi gpu(2 gpus) with a pretrained weight from single gpu on the same architecture (stopped single gpu training and restarted multi gpu with best weight)getting zero map values .Same architecture is working on single gpu . I am using two nvidia Testla 100
Command : /darknet detector train /home/Bnglr_ml/yolo_17june/darknet-master/data/obj_yolo3_cp.data /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/yolo_4_activation.cfg /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/model/yolov4_batch_arch__SingleGPU_107000.weights -map -dont_show -gpus 0,1
inference using below command : ./darknet detector map /home/Bnglr_ml/yolo_17june/darknet-master/data/obj_yolo3_cp.data /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/yolo_4_activation.cfg /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/yolo_4_activation.cfg /home/Bnglr_ml/yolo4_batchsize_repro/darknet-master/model/multigpu_nan/yolo_4_activation_best.weights CUDA-version: 10010 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.2.0
Result: for conf_thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan for conf_thresh = 0.25, TP = 0, FP = 0, FN = 185526, average IoU = 0.00 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.000000, or 0.00 %
config file (Same is used for single and multiple GPUS):
batch=80 subdivisions=32 width=1024 height=1024 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1 learning_rate=0.00013
learning_rate =0.00005
burn_in =1000
burn_in=1000 max_batches = 500500 policy=steps steps=400000,450000 scales=.1,.1 mosaic=1
Please can let me know if any solution or chnages required on this .