LJacksonPan / RaTrack

[ICRA2024] RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud
https://github.com/LJacksonPan/RaTrack
MIT License
94 stars 16 forks source link

About MOTA calculation #11

Closed LiawWun closed 3 weeks ago

LiawWun commented 3 weeks ago

Dear Author, I appreciate this great work again.

We understand that you are unable to share the evaluation code. Consequently, we have developed our own evaluation script based on the descriptions and code snippets provided in your paper. Specifically, we followed the procedure outlined in lines 83-117 of RaTrack/src/main_utils.py to project bounding boxes onto the radar coordinate system and filter radar points of moving objects.

kitti_locations = VodTrackLocations(root_dir=args.dataset_path,
                                                output_dir=args.dataset_path,
                                                frame_set_path="",
                                                pred_dir="",
                                                )

            frame_data_0 = FrameDataLoader(kitti_locations=kitti_locations,
                                           frame_number=str(index + 1).zfill(5))
            frame_data_1 = FrameDataLoader(kitti_locations=kitti_locations,
                                           frame_number=str(index).zfill(5))

            try:
                import dataset_classes.track_vod_3d as vod_data
                labels1 = vod_data.load_labels(frame_data_0.raw_tracking_labels, index + 1)
                labels2 = vod_data.load_labels(frame_data_1.raw_tracking_labels, index)

                transforms1 = FrameTransformMatrix(frame_data_0)
                transforms2 = FrameTransformMatrix(frame_data_1)

                lbl1 = labels1.data[index + 1]
                lbl2 = labels2.data[index]

                lbl1_mov = filter_moving_boxes_det(frame_data_0.raw_detection_labels, lbl1)
                lbl2_mov = filter_moving_boxes_det(frame_data_1.raw_detection_labels, lbl2)
            except:
                continue

            lbl1 = lbl1_mov
            lbl2 = lbl2_mov

            batch_size = pc1.size(0)
            num_examples += batch_size

        if args.model == 'track4d_radar':
            gt_mov_pts1, gt_cls1, gt_objs1, objs_idx1, objs_centre1, cls_obj_id1, boxes1, objs_combined1, objs_idx_combined1, objs_centre_combined1 = filter_object_points(
                args, lbl1, pc1, transforms1)

We then saved these points as Ground Truth files with confidence scores set to 1, as detailed in lines 171-184 of the same code, for subsequent calculations and scoring.

for obj_id, obj in objects.items():
                    idx += 1
                    result_str = "NA"
                    result_str += " 1"
                    result_str += " -1"
                    result_str += " -1"
                    result_str += " " + str(float(confs[idx]))
                    result_str += " " + str(obj_id)
                    for i in range(obj.size(2)):
                        result_str += " " + str(float(obj[0, 3, i]))
                        result_str += " " + str(float(obj[0, 4, i]))
                        result_str += " " + str(float(obj[0, 5, i]))
                    result_str += "\n"
                    file.writelines(result_str)

In computing MOTA, we adhered to the definitions provided in "Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics" and the point-based IOU description in Section V-A of your RaTrack paper. According to Section V-E, we calculated GT, FP, FN, and IDSW exclusively for objects with more than 5 radar points. Despite using 40 different confidence score thresholds ranging from 0 to 1, the results did not meet expectations (I get a negative MOTA value): lower thresholds led to a high number of false positives, while higher thresholds resulted in an excessive number of false negatives.

Additionally, we observed potential noise affecting detection outcomes in the motion segmentation results from line 132 of RaTrack/src/main_utils.py.

cls_mask = torch.where(cls.squeeze(0).squeeze(0) > 0.50, 1, 0)

Could you kindly clarify the following:

  1. In calculating false positives (FP) and false negatives (FN), did you consider tracking IDs or information from previous frames, or were these metrics based solely on detection results?

  2. What confidence score threshold was used for the MOTA values reported in your paper?

  3. Did you encounter any issues with noise in the motion segmentation results? Thank you for your time and assistance. We look forward to your valuable insights.

LiawWun commented 3 weeks ago

Below is my self-derived MOTA calculation code:

import numpy as np
from tabulate import tabulate
from scipy.optimize import linear_sum_assignment

class MOTA:

    def __init__(self, threshold= 0.5):

        self.reset()
        self.prev_match = dict()
        self.threshold= threshold

    def reset(self):

        self.mota = 0
        self.tp = 0
        self.fp = 0
        self.fn = 0
        self.miss = 0
        self.idsw = 0
        self.gt = 0

    def start_new_frame(self):
        self.prev_match = dict()

    def update(self, gt_object, pred_object, show_info = False):

        current_match = dict()
        match_gt = []
        match_pred = []

        current_gt = 0
        current_tp = 0
        current_fp = 0
        current_idsw = 0
        current_miss = 0

        current_gt = len(gt_object)

        gt_obj_id_list, pred_obj_id_list, distance_matrix = self.calculate_distance_matrix(gt_object, pred_object)

        if show_info:

            self.show_match_info(self.prev_match, "Prev")

        # Show Previous match
        if show_info:

            print("-" * 100)
            print("Distance matrix to compute the correspondence.")
            self.show_distance_matrix(gt_obj_id_list, pred_obj_id_list, distance_matrix)

        # Step 1: Check the last mapping is still valid or not.
        for prev_gt_id, prev_pred_id in self.prev_match.items():

            # Check object and hypothesis still exists in this frame
            if prev_gt_id in gt_obj_id_list and prev_pred_id in pred_obj_id_list: 

                prev_gt_index = gt_obj_id_list.index(prev_gt_id)
                prev_pred_index = pred_obj_id_list.index(prev_pred_id)

                # Check the distance is in the threshold
                if 1 - distance_matrix[prev_gt_index, prev_pred_index] > self.threshold:
                    match_gt.append(prev_gt_id)
                    match_pred.append(prev_pred_id)
                    current_match[prev_gt_id] = prev_pred_id
                    current_tp += 1
                    if show_info:
                        print(f"Gt {prev_gt_id} still match with Pred {prev_pred_id}")

        # Remove the matched objects
        for match_gt_id in match_gt:

            del gt_object[match_gt_id]

        for match_pred_id in match_pred:

            del pred_object[match_pred_id]

        # All remaining hypotheses are considered false positive.
        if len(gt_object) == 0:

            current_fp += len(pred_object) 

        # All remaining object are considered misses.
        elif len(pred_object) == 0:

            current_miss += len(gt_object)

        # Calculate the distance matrix for the remaining uncorrespondence objects.
        else:
            gt_obj_id_list, pred_obj_id_list, distance_matrix = self.calculate_distance_matrix(gt_object, pred_object)

            if show_info:

                print('+' * 100)
                print("Distance matrix to compute the correspondence.")
                self.show_distance_matrix(gt_obj_id_list, pred_obj_id_list, distance_matrix)

            row_ind, col_ind = linear_sum_assignment(distance_matrix)

            match_gt = set()
            match_pred = set()

            # Find the correspondence
            for r, c in zip(row_ind, col_ind):

                if 1 - distance_matrix[r, c] > self.threshold:

                    match_gt.add(gt_obj_id_list[r])
                    match_pred.add(pred_obj_id_list[c])
                    current_match[gt_obj_id_list[r]] = pred_obj_id_list[c]
                    current_tp += 1

                    if show_info:
                        print(f"Find match between GT {gt_obj_id_list[r]} and Pred {pred_obj_id_list[c]}")

                    # Check if ID switch occur.
                    if gt_obj_id_list[r] in self.prev_match.keys() and self.prev_match[gt_obj_id_list[r]] != pred_obj_id_list[c]:
                        current_idsw += 1
                        if show_info:
                            print("IDSW Detected!")

            # Show match info
            if show_info:
                self.show_match_info(current_match, "Current")

            # Remaining objects are consider as misses
            gt_obj_id_set = set(gt_obj_id_list)
            remaining_gt_obj = gt_obj_id_set - match_gt
            current_miss += len(remaining_gt_obj)

            # Remaining hypotheses are consider as false positive
            pred_obj_id_set = set(pred_obj_id_list)
            remaining_pred_obj = pred_obj_id_set - match_pred
            current_fp += len(remaining_pred_obj)

        self.gt += current_gt
        self.tp += current_tp
        self.fp += current_fp
        self.miss += current_miss
        self.idsw += current_idsw
        self.prev_match = current_match

        if show_info:
            print("#" * 100)
            print("Summary:")
            data =[[current_gt, current_tp, current_fp, current_miss, current_idsw, self.gt, self.tp, self.fp, self.miss, self.idsw]]
            headers = ["GT", "TP", "FP", "MISS", "IDSW", "Total GT", "Total TP", "Total FP", "Total MISS", "Total IDSW"]
            print(tabulate(data, headers=headers, tablefmt="grid"))

    def summary(self, show_table = True):

        if show_table:
            print("#" * 100)
            print("Summary:")
            data =[[self.gt, self.tp, self.fp, self.miss, self.idsw]]
            headers = ["Total GT", "Total TP", "Total FP", "Total MISS", "Total IDSW"]
            print(tabulate(data, headers=headers, tablefmt="grid"))
        self.mota = 1 - ((self.miss + self.fp + self.idsw) / self.gt)
        self.recall = self.tp / (self.tp + self.miss)
        return self.gt, self.tp, self.fp, self.miss, self.idsw, self.mota, self.recall

    def calculate_distance_matrix(self, gt_object, pred_object):

        gt_obj_id_list = list(gt_object.keys())
        pred_obj_id_list = list(pred_object.keys())
        distance = np.zeros((len(gt_obj_id_list), len(pred_obj_id_list)), dtype = np.float32)

        for i, gt_id in enumerate(gt_obj_id_list):

            for j, pred_id in enumerate(pred_obj_id_list):

                distance[i, j] = 1 - self.iou_calculate(gt_object[gt_id], pred_object[pred_id])

        return gt_obj_id_list, pred_obj_id_list, distance

    def iou_calculate(self, Q, P):

        set_Q = set(map(tuple, Q))
        set_P = set(map(tuple, P))

        union_set = set_Q.union(set_P)
        intersection_set = set_Q.intersection(set_P)

        iou = len(list(intersection_set)) / len(list(union_set))

        return iou

    def show_distance_matrix(self, gt_obj_id_list, pred_obj_id_list, distance):

        headers = [""] + [str(id) for id in pred_obj_id_list]
        table = [[str(gt_id)] + [f"{ 1 - val:.2f}" for val in row] for gt_id, row in zip(gt_obj_id_list, distance)]
        table_str = tabulate(table, headers=headers, tablefmt="grid")

        print(f"Gt object id: {list(gt_obj_id_list)}")
        print(f"Detect object id: {list(pred_obj_id_list)}")
        print(table_str)

    def show_match_info(self, match_info, title):

        if len(match_info) == 0:
            print("=" * 100)
            print(title + " ID mapping information")
            print("No matching information")
            print("=" * 100)
            return

        table = [["GT Object ID", "Predicted Object ID"]]
        table.extend([[gt_id, pred_id] for gt_id, pred_id in match_info.items()])
        print("=" * 100)
        print(title + " ID mapping information")
        print(tabulate(table, headers="firstrow", tablefmt="grid"))
        print("=" * 100)

import sys
from pathlib import Path

sys.path.append(str(Path(__file__).resolve().parent.parent))

import os.path
import struct
import open3d as o3d
from datetime import time

import numpy as np
from torch.utils.data import Dataset

from dataset_classes.kitti.kitti_calib import Calibration
from vod.frame.transformations import homogeneous_transformation

from vod.configuration import VodTrackLocations
from vod.frame import FrameDataLoader, FrameTransformMatrix

from tqdm import tqdm
from torch.utils.data import DataLoader
import dataset_classes.track_vod_3d as vod_data
from dataset_classes.kitti.box import Box3D
from dataset_classes.track_vod_3d import TrackingDataVOD
from utils import parse_args_from_yaml

from models.utils.track4d_utils import filter_moving_boxes_det, get_bbx_param, \
    get_gt_flow_new, obj_centre, filter_object_points, map_gt_objects

import matplotlib
matplotlib.use('TkAgg', force=True)
import matplotlib.pyplot as plt

from MOTA_estimator import*

def get_object_from_txt(txt_file, conf_threshold):

    obj = dict()
    with open(txt_file, 'r') as file:

        lines = file.readlines()
        for line in lines:
            data = line.strip().split()
            conf_score = float(data[4])
            obj_id = data[5]
            _obj_point = data[6: len(data)]
            obj_point = np.array(_obj_point, dtype = np.float32).reshape(-1, 3)

            if conf_score >= conf_threshold:
                obj[obj_id] = obj_point

    return obj

def filter_object_point(obj, min_point):

    pop_id_list = []
    for obj_id, points in obj.items():

        if points.shape[0] < min_point:

            pop_id_list.append(obj_id)

    for obj_id in pop_id_list:
        del obj[obj_id]

def print_object_info(obj):

    print("#" * 80)
    count = 0
    for obj_id, points in obj.items():
        count += 1
        print(f"Id: {obj_id}, point num: {points.shape[0]}")
    print(f"There are {count} objects in this frame.")
    print("#" * 80)

def find_radar_indices(pc, obj):

    pc_list = pc.tolist()
    for obj, points in obj.items():
        point_list = points.tolist()
        existence = [item in pc_list for item in point_list]
        true_count = existence.count(True)
        false_count = existence.count(False)
        if false_count != 0:
            print(f"Object {obj} is not in current radar frame.")
        #print(f"Object ID {obj}, True count: {true_count}, False count: {false_count}")

class Frame_data:

    def __init__(self, radar_point, gt_object, pred_object, camera):

        self.radar_pc = radar_point
        self.gt_object = gt_object
        self.pred_object = pred_object
        self.camera = camera

if __name__ == "__main__":

    vis_per_frame = False

    kitti_locations = VodTrackLocations(root_dir = '/media/ee904/data2/view_of_delft_PUBLIC',
                                        output_dir = '',
                                        frame_set_path = '',
                                        pred_dir = '')

    clips_dir = "/home/ee904/Desktop/RaTrack/clips"
    clips = ['delft_1','delft_10','delft_14','delft_22']
    start_idx = 1

    gt_folder = "/home/ee904/Desktop/RaTrack/ground_truth/label_point"
    pred_folder = "/home/ee904/Desktop/RaTrack/results"

    mota = MOTA(threshold = 0.25) 

    for clip in clips:

        print(clip)
        txt_path = os.path.join(clips_dir, clip + '.txt')

        with open(txt_path) as f:
            frames = f.read().splitlines()

        prev_frame_data = None

        counter = 0
        mota.start_new_frame()

        for frame in frames:

            os.system('clear')
            print("#" * 100)
            print(f"Frame: {frame}")
            frame_data = FrameDataLoader(kitti_locations=kitti_locations, frame_number=str(int(frame)).zfill(5))

            radar_pc_xyz = frame_data.radar_data[:, :3]
            radar_pc_xyz_list = list(radar_pc_xyz)

            gt_txt_path = os.path.join(gt_folder, clip ,frame + ".txt")
            pred_txt_path = os.path.join(pred_folder, clip, frame + ".txt")

            if not (os.path.exists(gt_txt_path) and os.path.exists(pred_txt_path)):
                print(f"File {frame}.txt not exists.")
                continue

            gt_object = get_object_from_txt(gt_txt_path, conf_threshold = 1.0)
            pred_object = get_object_from_txt(pred_txt_path, conf_threshold = 0.4)

            # Check if the object point exists in the current radar scan
            find_radar_indices(radar_pc_xyz, gt_object)
            find_radar_indices(radar_pc_xyz, pred_object)

            filter_object_point(gt_object, min_point = 5)
            filter_object_point(pred_object, min_point = 5)

            current_frame_data = Frame_data(radar_pc_xyz, gt_object, pred_object, frame_data.image)

            mota.update(gt_object = gt_object.copy(), pred_object = pred_object.copy(), show_info = vis_per_frame)

    gt_num, tp_num, fp_num, miss_num, idsw_num, mota_value, recall_value = mota.summary(show_table = True)
    print(f"MOTA = {mota_value}, Recall = {recall_value}")
MrTooOldDriver commented 3 weeks ago

Thank you for your interest in our work.

  1. When calculating false positives (FP) and false negatives (FN), tracking IDs must be considered since we are dealing with a tracking problem. I recommend you check out the original AB3DMOT implementation at https://github.com/xinshuoweng/AB3DMOT and the paper at https://arxiv.org/abs/1907.03961.
  2. The confidence threshold we report for MOTA is 0.078483.
  3. We also observed the same issues with noisy motion segmentation. We think this is one of the issues stemming from the end-to-end training pipeline. Pretraining the motion segmentation usually yields decent results, but when switching to a full end-to-end setup the motion segmentation starts becoming noisy. This is likely due to training problems as mentioned in the paper.

I am sorry that I don't have the time to debug the code for you. However, I suggest you run MOTA with other baselines we have experimented with and compare the results (e.g. CenterPoint and AB3DMOT with/without PP backbone). Referring to AB3DMOT implementation for MOTA may also help you with this.

For more details, when dealing with the ground truths, please ensure you have also removed any objects with fewer than 5 points (aka using the same omitted point variables). Also for cyclists, please note that VoD annotates the bicycle and the person riding it as separate labels and bounding boxes (If my memory is correct, the label should be called 'rider'). Please ensure you have merged them into a single label when evaluating.

LiawWun commented 2 weeks ago

Thank you for your prompt response. I have used the suggested code and made modifications based on the following suggestions. I successfully ran the evaluation code, but the MOTA value is still negative.

  1. I removed objects with fewer than 5 radar points (both in the ground truth and model predictions).
  2. I set the confidence threshold to 0.078483.
  3. I merged the labels for bicycles and the person riding them together (following the suggested method).

Additionally, I made a video my video to show the ground truth and model detection results. In the video, the ground truth is displayed as bounding boxes (for visualization purposes only; in the evaluation code, the calculations are still based on radar points), and the model predictions are represented using different colors.

I compared this video with the one you provided RaTrack video and I noticed that in my video, there are a lot of false positives compares to yours, for example, between the 0:14 and 0:31 minute marks,, whereas in your video, between the 0:24 and 0:30 minute marks, there aren’t as many false positives.

Would it be possible to share the detection results used in your video? This could help me pinpoint where the discrepancies are arising.

Thank you for your continued assistance.

my email: jiawunliaw@gmail.com