RainBowLuoCS / DiffusionTrack

[AAAI 2024] DiffusionTrack: Diffusion Model For Multi-Object Tracking. DiffusionTrack is the first work to employ the diffusion model for multi-object tracking by formulating it as a generative noise-to-tracking diffusion process.
Other
164 stars 7 forks source link

Question about diffusion_postprocess #5

Closed Greysahy closed 3 weeks ago

Greysahy commented 10 months ago

Hi, author. I am very interested in your work, but I have encountered some issues while reading the code below:

    def diffusion_postprocess(self,diffusion_outputs,conf_scores,nms_thre=0.7,conf_thre=0.6):
        # input shape: (2B, num_proposal, 5), (B, num_proposal, 1)

        pre_prediction,cur_prediction=diffusion_outputs.split(len(diffusion_outputs)//2,dim=0)  
        # (B, num_proposal, 5), (B, num_proposal, 5)

        output = [None for _ in range(len(pre_prediction))]  # [None, None, ... ...] shape=(B)
        for i,(pre_image_pred,cur_image_pred,association_score) in enumerate(zip(pre_prediction,cur_prediction,conf_scores)):

            association_score=association_score.flatten()  # (num_proposal)
            # If none are remaining => process next image
            if not pre_image_pred.size(0):
                continue
            # _, conf_mask = torch.topk((image_pred[:, 4] * class_conf.squeeze()), 1000)
            # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
            detections=torch.zeros((2,len(cur_image_pred),7),dtype=cur_image_pred.dtype,device=cur_image_pred.device)
            # detections.shape = (2, num_proposals, 7)
            # detections[0]
            detections[0,:,:4]=pre_image_pred[:,:4]
            detections[1,:,:4]=cur_image_pred[:,:4]
            detections[0,:,4]=association_score
            detections[1,:,4]=association_score
            detections[0,:,5]=torch.sqrt(torch.sigmoid(pre_image_pred[:,4])*association_score)
            detections[1,:,5]=torch.sqrt(torch.sigmoid(cur_image_pred[:,4])*association_score)

            score_out_index=association_score>conf_thre  

            # strategy=torch.mean
            # value=strategy(detections[:,:,5],dim=0,keepdim=False)
            # score_out_index=value>conf_thre

            detections=detections[:,score_out_index,:]  

            if not detections.size(1):
                output[i]=detections
                continue

            nms_out_index_3d = cluster_nms(
                                        detections[0,:,:4],
                                        detections[1,:,:4],
                                        # value[score_out_index],
                                        detections[0,:,4],
                                        iou_threshold=nms_thre)

            detections = detections[:,nms_out_index_3d,:]

            if output[i] is None:
                output[i] = detections
            else:
                output[i] = torch.cat((output[i], detections))

            # output[i] = (2, num_proposal, 7)

        return output[0][0],output[0][1],torch.cat([output[1][0],output[1][1]],dim=0) if len(output)>=2 else None

For the diffusion_postprocess function, given that it takes diffusion_outputs (2*batch_size, num_proposal, 5) and conf_scores (batch_size, num_proposal, 1)as inputs, the return value output[0][0] (num_proposal, 7),output[0][1] (num_proposal, 7) ... ... represents the detection results of the previous image for the first/second iteration in the batch. This seems to discard the majority of the information in the batch. Why is the process handled this way? If there's a misunderstanding on my part, please correct me. Much appreciated, thank you!