Bug? CUDA illegal memory access... from recent changes to convolutional_layer.c ?

convolutional_layer_changes

It seems that a recent change to convolutional_layer.c causes my system to abort around point where assessing validation files for mAP. Within the last month or so, the logic on lines 322, 360, 397 of convolutional_layer.c was modified and the attached file comparison shows 2 of those 3 changes. Version on the right was pulled today; version on the left was from mid August, I think.
When I built code pulled today, it appears to abort with CUDA illegal memory access at point where assessing validation files for mAP.
After some investigation, I changed the 3 lines in most recent convolutional_layer.c back to the way they were, rebuilt the code and re-ran the same command. With only these 3 changes reverted, my build now assesses mAP and is training. So, did the recent change introduce a bug, or expose my mis-use?
Error to screen is shown below (I forced short iteration so I wouldn't have to wait so long to regenerate error)
Command and initial screen output are shown below that
Linux CentOS 7 system

tail of screen output:

v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 30 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000000, iou_loss = 0.000000, total_loss = 0.000000 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 37 Avg (IOU: 0.551367, GIOU: 0.478018), Class: 0.564141, Obj: 0.031875, No Obj: 0.000038, .5R: 0.666667, .75R: 0.000000, count: 6, class_loss = 0.053447, iou_loss = 0.009529, total_loss = 0.062976 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 44 Avg (IOU: 0.500035, GIOU: 0.461274), Class: 0.914468, Obj: 0.318563, No Obj: 0.000283, .5R: 0.525000, .75R: 0.083333, count: 120, class_loss = 0.650715, iou_loss = 5.837561, total_loss = 6.488276 
 total_bbox = 53996, rewritten_bbox = 0.000000 % 
Loaded: 0.000026 seconds

 adversarial training, adversarial_lr = 0.054500 
 x_size = 25165824, original_delta = 0x7f9396000000, original_input = 0x7f9416000000, net.learning_rate = 0.054500 

 (next mAP calculation at 110 iterations) 
 110: 0.237673, 0.273246 avg loss, 0.002610 rate, 1.888154 seconds, 56320 images, 1.017012 hours left
4CUDA Error Prev: an illegal memory access was encountered: Success

 calculation mAP (mean average precision)...
 Detection layer: 30 - type = 28 
 Detection layer: 37 - type = 28 
 Detection layer: 44 - type = 28 

 CUDA Error Prev: an illegal memory access was encountered

Command:

darknet detector train ./uas.data ./yolov4-tiny-3l-1cls6.cfg ./yolov4-tiny-3l.weights -map 2>&1
 CUDA-version: 10020 (11000), cuDNN: 8.0.2, GPU count: 1  
 OpenCV version: 3.4.11
 0 : compute_capability = 610, cudnn_half = 0, GPU: GeForce GTX 1080 Ti 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 2    256 x 256 x   3 ->  128 x 128 x  32 0.028 BF
   1 conv     64       3 x 3/ 2    128 x 128 x  32 ->   64 x  64 x  64 0.151 BF
   2 conv     64       3 x 3/ 1     64 x  64 x  64 ->   64 x  64 x  64 0.302 BF
   3 route  2                              1/2 ->   64 x  64 x  32 
   4 conv     32       3 x 3/ 1     64 x  64 x  32 ->   64 x  64 x  32 0.075 BF
   5 conv     32       3 x 3/ 1     64 x  64 x  32 ->   64 x  64 x  32 0.075 BF
   6 route  5 4                                ->   64 x  64 x  64 
   7 conv     64       1 x 1/ 1     64 x  64 x  64 ->   64 x  64 x  64 0.034 BF
   8 route  2 7                                ->   64 x  64 x 128 
   9 max                2x 2/ 2     64 x  64 x 128 ->   32 x  32 x 128 0.001 BF
  10 conv    128       3 x 3/ 1     32 x  32 x 128 ->   32 x  32 x 128 0.302 BF
  11 route  10                             1/2 ->   32 x  32 x  64 
  12 conv     64       3 x 3/ 1     32 x  32 x  64 ->   32 x  32 x  64 0.075 BF
  13 conv     64       3 x 3/ 1     32 x  32 x  64 ->   32 x  32 x  64 0.075 BF
  14 route  13 12                              ->   32 x  32 x 128 
  15 conv    128       1 x 1/ 1     32 x  32 x 128 ->   32 x  32 x 128 0.034 BF
  16 route  10 15                              ->   32 x  32 x 256 
  17 max                2x 2/ 2     32 x  32 x 256 ->   16 x  16 x 256 0.000 BF
  18 conv    256       3 x 3/ 1     16 x  16 x 256 ->   16 x  16 x 256 0.302 BF
  19 route  18                             1/2 ->   16 x  16 x 128 
  20 conv    128       3 x 3/ 1     16 x  16 x 128 ->   16 x  16 x 128 0.075 BF
  21 conv    128       3 x 3/ 1     16 x  16 x 128 ->   16 x  16 x 128 0.075 BF
  22 route  21 20                              ->   16 x  16 x 256 
  23 conv    256       1 x 1/ 1     16 x  16 x 256 ->   16 x  16 x 256 0.034 BF
  24 route  18 23                              ->   16 x  16 x 512 
  25 max                2x 2/ 2     16 x  16 x 512 ->    8 x   8 x 512 0.000 BF
  26 conv    512       3 x 3/ 1      8 x   8 x 512 ->    8 x   8 x 512 0.302 BF
  27 conv    256       1 x 1/ 1      8 x   8 x 512 ->    8 x   8 x 256 0.017 BF
  28 conv    512       3 x 3/ 1      8 x   8 x 256 ->    8 x   8 x 512 0.151 BF
  29 conv     18       1 x 1/ 1      8 x   8 x 512 ->    8 x   8 x  18 0.001 BF
  30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
  31 route  27                                 ->    8 x   8 x 256 
  32 conv    128       1 x 1/ 1      8 x   8 x 256 ->    8 x   8 x 128 0.004 BF
  33 upsample                 2x     8 x   8 x 128 ->   16 x  16 x 128
  34 route  33 23                              ->   16 x  16 x 384 
  35 conv    256       3 x 3/ 1     16 x  16 x 384 ->   16 x  16 x 256 0.453 BF
  36 conv     18       1 x 1/ 1     16 x  16 x 256 ->   16 x  16 x  18 0.002 BF
  37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
  38 route  35                                 ->   16 x  16 x 256 
  39 conv     64       1 x 1/ 1     16 x  16 x 256 ->   16 x  16 x  64 0.008 BF
  40 upsample                 2x    16 x  16 x  64 ->   32 x  32 x  64
  41 route  40 15                              ->   32 x  32 x 192 
  42 conv    128       3 x 3/ 1     32 x  32 x 192 ->   32 x  32 x 128 0.453 BF
  43 conv     18       1 x 1/ 1     32 x  32 x 128 ->   32 x  32 x  18 0.005 BF
  44 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
Total BFLOPS 3.036 
avg_outputs = 107207 
 0 : compute_capability = 610, cudnn_half = 0, GPU: GeForce GTX 1080 Ti 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 2    256 x 256 x   3 ->  128 x 128 x  32 0.028 BF
   1 conv     64       3 x 3/ 2    128 x 128 x  32 ->   64 x  64 x  64 0.151 BF
   2 conv     64       3 x 3/ 1     64 x  64 x  64 ->   64 x  64 x  64 0.302 BF
   3 route  2                              1/2 ->   64 x  64 x  32 
   4 conv     32       3 x 3/ 1     64 x  64 x  32 ->   64 x  64 x  32 0.075 BF
   5 conv     32       3 x 3/ 1     64 x  64 x  32 ->   64 x  64 x  32 0.075 BF
   6 route  5 4                                ->   64 x  64 x  64 
   7 conv     64       1 x 1/ 1     64 x  64 x  64 ->   64 x  64 x  64 0.034 BF
   8 route  2 7                                ->   64 x  64 x 128 
   9 max                2x 2/ 2     64 x  64 x 128 ->   32 x  32 x 128 0.001 BF
  10 conv    128       3 x 3/ 1     32 x  32 x 128 ->   32 x  32 x 128 0.302 BF
  11 route  10                             1/2 ->   32 x  32 x  64 
  12 conv     64       3 x 3/ 1     32 x  32 x  64 ->   32 x  32 x  64 0.075 BF
  13 conv     64       3 x 3/ 1     32 x  32 x  64 ->   32 x  32 x  64 0.075 BF
  14 route  13 12                              ->   32 x  32 x 128 
  15 conv    128       1 x 1/ 1     32 x  32 x 128 ->   32 x  32 x 128 0.034 BF
  16 route  10 15                              ->   32 x  32 x 256 
  17 max                2x 2/ 2     32 x  32 x 256 ->   16 x  16 x 256 0.000 BF
  18 conv    256       3 x 3/ 1     16 x  16 x 256 ->   16 x  16 x 256 0.302 BF
  19 route  18                             1/2 ->   16 x  16 x 128 
  20 conv    128       3 x 3/ 1     16 x  16 x 128 ->   16 x  16 x 128 0.075 BF
  21 conv    128       3 x 3/ 1     16 x  16 x 128 ->   16 x  16 x 128 0.075 BF
  22 route  21 20                              ->   16 x  16 x 256 
  23 conv    256       1 x 1/ 1     16 x  16 x 256 ->   16 x  16 x 256 0.034 BF
  24 route  18 23                              ->   16 x  16 x 512 
  25 max                2x 2/ 2     16 x  16 x 512 ->    8 x   8 x 512 0.000 BF
  26 conv    512       3 x 3/ 1      8 x   8 x 512 ->    8 x   8 x 512 0.302 BF
  27 conv    256       1 x 1/ 1      8 x   8 x 512 ->    8 x   8 x 256 0.017 BF
  28 conv    512       3 x 3/ 1      8 x   8 x 256 ->    8 x   8 x 512 0.151 BF
  29 conv     18       1 x 1/ 1      8 x   8 x 512 ->    8 x   8 x  18 0.001 BF
  30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
  31 route  27                                 ->    8 x   8 x 256 
  32 conv    128       1 x 1/ 1      8 x   8 x 256 ->    8 x   8 x 128 0.004 BF
  33 upsample                 2x     8 x   8 x 128 ->   16 x  16 x 128
  34 route  33 23                              ->   16 x  16 x 384 
  35 conv    256       3 x 3/ 1     16 x  16 x 384 ->   16 x  16 x 256 0.453 BF
  36 conv     18       1 x 1/ 1     16 x  16 x 256 ->   16 x  16 x  18 0.002 BF
  37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
  38 route  35                                 ->   16 x  16 x 256 
  39 conv     64       1 x 1/ 1     16 x  16 x 256 ->   16 x  16 x  64 0.008 BF
  40 upsample                 2x    16 x  16 x  64 ->   32 x  32 x  64
  41 route  40 15                              ->   32 x  32 x 192 
  42 conv    128       3 x 3/ 1     32 x  32 x 192 ->   32 x  32 x 128 0.453 BF
  43 conv     18       1 x 1/ 1     32 x  32 x 128 ->   32 x  32 x  18 0.005 BF
  44 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
Total BFLOPS 3.036 
avg_outputs = 107207 
Loading weights from ./yolov4-tiny-3l.weights... Prepare additional network for mAP calculation...
net.optimized_memory = 0 
mini_batch = 1, batch = 4, time_steps = 1, train = 0 
 classes_multipliers: 1.0, 
nms_kind: greedynms (1), beta = 0.600000 
 classes_multipliers: 1.0, 
nms_kind: greedynms (1), beta = 0.600000 
 classes_multipliers: 1.0, 
nms_kind: greedynms (1), beta = 0.600000 
yolov4-tiny-3l-1cls6
net.optimized_memory = 0 
mini_batch = 128, batch = 512, time_steps = 1, train = 1 
 classes_multipliers: 1.0, 
nms_kind: greedynms (1), beta = 0.600000 
 classes_multipliers: 1.0, 
nms_kind: greedynms (1), beta = 0.600000 
 classes_multipliers: 1.0, 
nms_kind: greedynms (1), beta = 0.600000 
Done! Loaded 29 layers from weights-file

AlexeyAB / darknet

Bug? CUDA illegal memory access... from recent changes to convolutional_layer.c ? #6662