[AAAI 2024] DiffusionTrack: Diffusion Model For Multi-Object Tracking. DiffusionTrack is the first work to employ the diffusion model for multi-object tracking by formulating it as a generative noise-to-tracking diffusion process.
Hi, author. I am very interested in your work, but I have encountered some issues while reading the code below:
def diffusion_postprocess(self,diffusion_outputs,conf_scores,nms_thre=0.7,conf_thre=0.6):
# input shape: (2B, num_proposal, 5), (B, num_proposal, 1)
pre_prediction,cur_prediction=diffusion_outputs.split(len(diffusion_outputs)//2,dim=0)
# (B, num_proposal, 5), (B, num_proposal, 5)
output = [None for _ in range(len(pre_prediction))] # [None, None, ... ...] shape=(B)
for i,(pre_image_pred,cur_image_pred,association_score) in enumerate(zip(pre_prediction,cur_prediction,conf_scores)):
association_score=association_score.flatten() # (num_proposal)
# If none are remaining => process next image
if not pre_image_pred.size(0):
continue
# _, conf_mask = torch.topk((image_pred[:, 4] * class_conf.squeeze()), 1000)
# Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
detections=torch.zeros((2,len(cur_image_pred),7),dtype=cur_image_pred.dtype,device=cur_image_pred.device)
# detections.shape = (2, num_proposals, 7)
# detections[0]
detections[0,:,:4]=pre_image_pred[:,:4]
detections[1,:,:4]=cur_image_pred[:,:4]
detections[0,:,4]=association_score
detections[1,:,4]=association_score
detections[0,:,5]=torch.sqrt(torch.sigmoid(pre_image_pred[:,4])*association_score)
detections[1,:,5]=torch.sqrt(torch.sigmoid(cur_image_pred[:,4])*association_score)
score_out_index=association_score>conf_thre
# strategy=torch.mean
# value=strategy(detections[:,:,5],dim=0,keepdim=False)
# score_out_index=value>conf_thre
detections=detections[:,score_out_index,:]
if not detections.size(1):
output[i]=detections
continue
nms_out_index_3d = cluster_nms(
detections[0,:,:4],
detections[1,:,:4],
# value[score_out_index],
detections[0,:,4],
iou_threshold=nms_thre)
detections = detections[:,nms_out_index_3d,:]
if output[i] is None:
output[i] = detections
else:
output[i] = torch.cat((output[i], detections))
# output[i] = (2, num_proposal, 7)
return output[0][0],output[0][1],torch.cat([output[1][0],output[1][1]],dim=0) if len(output)>=2 else None
For the diffusion_postprocess function, given that it takes diffusion_outputs (2*batch_size, num_proposal, 5) and conf_scores (batch_size, num_proposal, 1)as inputs, the return value output[0][0] (num_proposal, 7),output[0][1] (num_proposal, 7) ... ... represents the detection results of the previous image for the first/second iteration in the batch. This seems to discard the majority of the information in the batch. Why is the process handled this way? If there's a misunderstanding on my part, please correct me. Much appreciated, thank you!
Hi, author. I am very interested in your work, but I have encountered some issues while reading the code below:
For the
diffusion_postprocess
function, given that it takesdiffusion_outputs (2*batch_size, num_proposal, 5)
andconf_scores (batch_size, num_proposal, 1)
as inputs, the return valueoutput[0][0] (num_proposal, 7),output[0][1] (num_proposal, 7) ... ...
represents the detection results of the previous image for the first/second iteration in the batch. This seems to discard the majority of the information in the batch. Why is the process handled this way? If there's a misunderstanding on my part, please correct me. Much appreciated, thank you!