Zzh-tju / CIoU

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)
GNU General Public License v3.0
319 stars 44 forks source link

关于Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation #19

Open Wanghe1997 opened 2 years ago

Wanghe1997 commented 2 years ago

作者您好,您的这篇论文中CIoU和Cluster-NMS的组合可以用在YOLOv5中吗?我在YOLOv5中该如何将torchvision原始的NMS替换为Cluster-NMS呢?谢谢

Wanghe1997 commented 2 years ago
```python # For batch mode Cluster-Weighted NMS def non_max_suppression(prediction, conf_thres=0.1, iou_thres=0.6, max_box=1500, merge=False, classes=None, agnostic=False): """Performs Non-Maximum Suppression (NMS) on inference results Returns: detections with shape: nx6 (x1, y1, x2, y2, conf, cls) """ nc = prediction[0].shape[1] - 5 # number of classes xc = prediction[..., 4] > conf_thres # candidates # Settings min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height max_det = 300 # maximum number of detections per image time_limit = 10.0 # seconds to quit after redundant = True # require redundant detections multi_label = nc > 1 # multiple labels per box (adds 0.5ms/img) t = time.time() output = [None] * prediction.shape[0] pred1 = (prediction < -1).float()[:,:max_box,:6] # pred1.size()=[batch, max_box, 6] denotes boxes without offset by class pred2 = pred1[:,:,:4]+0 # pred2 denotes boxes with offset by class batch_size = prediction.shape[0] for xi, x in enumerate(prediction): # image index, image inference # Apply constraints # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height x = x[xc[xi]] # confidence # If none remain process next image if not x.shape[0]: continue # Compute conf x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf # Box (center x, center y, width, height) to (x1, y1, x2, y2) box = xywh2xyxy(x[:, :4]) # Detections matrix nx6 (xyxy, conf, cls) if multi_label: i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1) else: # best class only conf, j = x[:, 5:].max(1, keepdim=True) x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres] # Filter by class if classes: x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)] # Apply finite constraint # if not torch.isfinite(x).all(): # x = x[torch.isfinite(x).all(1)] # If none remain process next image n = x.shape[0] # number of boxes if not n: continue # Sort by confidence x = x[x[:, 4].argsort(descending=True)] c = x[:, 5] * 0 if agnostic else x[:, 5] # classes boxes = (x[:, :4].clone() + c.view(-1, 1) * max_wh)[:max_box] # boxes (offset by class), scores pred2[xi,:] = torch.cat((boxes, pred2[xi,:]), 0)[:max_box] # If less than max_box, padding 0. pred1[xi,:] = torch.cat((x[:max_box], pred1[xi,:]), 0)[:max_box] # Batch mode Weighted Cluster-NMS iou = jaccard(pred2, pred2).triu_(diagonal=1) # switch to 'jaccard_diou' function for using Cluster-DIoU-NMS B = iou for i in range(200): A=B maxA=A.max(dim=1)[0] E = (maxA0.8) + torch.eye(max_box).cuda().expand(batch_size,max_box,max_box)) * (pred1[:,:,4].reshape((batch_size,1,max_box))) pred1[:,:, :4]=torch.matmul(weights,pred1[:,:,:4]) / weights.sum(2, keepdim=True) # weighted coordinates for jj in range(batch_size): output[jj] = pred1[jj][keep[jj]] return output ```

作者,是不是将这段代码替换掉YOLOv5 general.py中同名的non_max_suppression函数就相当于使用了Cluster-NMS了呢?我使用的YOLOv5代码是5.0版本,默认的边框损失已经使用CIoU。这是第一个问题。 第二个问题是: image 图中画圈的这个参数对应YOLOv5代码中的哪个参数呢? 第三个问题是: 我在YOLOv5 Evaluation过程(也就是运行test.py)中的策略是batchsize为16,不使用TTA。在这种策略下,使用您的Cluster-NMS替代原YOLOv5的NMS会提高mAP等参数吗?按您的经验,超参数应该如何设置效果最好?谢谢

Zzh-tju commented 2 years ago
  1. 这个代码是Batch mode Weighted Cluster-NMS,batch mode的意思便是在一个batch中统一执行NMS,否则需要循环处理每一张图片。鉴于torchvision NMS非常快,以及Batch mode Weighted Cluster-NMS需要一些预处理操作,因此在TTA开启时Batch mode Weighted Cluster-NMS的速度才可超过torchvision NMS。如果你不想用batch mode Cluster-NMS,可以使用https://github.com/Zzh-tju/ultralytics-YOLOv3-Cluster-NMS 不管你使用哪个,都要注意变量名可能会与最新YOLOv5有所出入,毕竟它一直在不断更新,所以简单替换未必能跑通。

  2. weighted threshold指的是weights = (B*(B>0.8)中的0.8这个阈值,它表示只挑选与被保留框IoU>0.8的相邻框进行坐标加权。

  3. 据YOLOv5作者说,YOLOv5的检测框已经预测得足够准了,因此NMS的坐标加权,DIoU-NMS之类可能提升有限。因此他出于速度的考虑,维持torchvision NMS (精度等同于Original NMS) 的使用。因此我的建议是,Cluster-NMS及其变体系列有可能可以获得精度提升,需要尝试几组参数,如weighted threshold,DIoU-NMS的中心点距离惩罚幅度,如果使用batch mode Cluster-NMS还可考虑max-box的参数,以求得速度-精度的权衡。当然,在开启TTA时,batch mode Cluster-NMS是最快的。值得一提的是,如果你重视召回率,那么DIoU-NMS必能提高recall。

yilifan commented 2 years ago

您好,关于源代码实现计算ciou的小疑问,bboxes1 = torch.sigmoid(bboxes1),bboxes2 = torch.sigmoid(bboxes2),计算后面高宽的时候w1 = torch.exp(bboxes1[:, 2]),不是很明白为什么这里选择先对box激活。参考其他的一些iou实现源码都没有激活这个操作,反而是直接对box计算