About trained weight and evaluation result

LiawWun commented 1 month ago

Hello, thanks for sharing this amazing work. I have some questions about the trained weight and evaluation results.

Can I use the trained weights from checkpoint/track4d_radar/models/model.last.t7 to directly infer and evaluate on the VOD dataset? When I used the weight from it for direct inference on the VOD dataset, I noticed that the results of motion segmentation and clustering (after removing clusters with less than 5 points) were quite noisy in some frames.

Frame around 410 seq00410

Frame around 4729 seq04729

Would it be possible to share the code that can visualize the result in the same way as shown in the demo video? (Include the tracking path and filter out the moving objects less than 5 points).

MrTooOldDriver commented 1 month ago

Thank you for your question.

Yes, you can directly infer and evaluate on the VOD dataset as long as you follow the same data loading and preprocessing methods as our approach. The noisy clustering is largely due to poor motion segmentation. If you train using our method, you will find that it is difficult to train and the gradients aren't that stable. We do think these stability issues are coming from the GRU. You could try training a checkpoint on your own with gradient clipping, which helps the model converge.

Regarding the visualization code, I am currently out of the office, so I don't have direct access to the project files. However, I did find some parts that might help you visualize the results. I am not sure if it will work directly.

from vod.configuration import KittiLocations
from vod.frame import FrameDataLoader
from vod.visualization import Visualization2D
import numpy as np
import matplotlib.pyplot as plt
from vod.frame import FrameTransformMatrix
import os

kitti_locations = KittiLocations(root_dir="view_of_delft_PUBLIC",
                                output_dir="view_of_delft_PUBLIC_output",
                                frame_set_path="",
                                pred_dir="",
                                )

class tData:
    """
        Utility class to load data.
    """
    def __init__(self,frame=-1,obj_type="unset",truncation=-1,occlusion=-1,\
                 obs_angle=-10,x1=-1,y1=-1,x2=-1,y2=-1,w=-1,h=-1,l=-1,\
                 x=-1000,y=-1000,z=-1000,ry=-10,score=-1000,track_id=-1,points=[]):
        """
            Constructor, initializes the object given the parameters.
        """

        # init object data
        self.frame      = frame
        self.track_id   = track_id
        self.obj_type   = obj_type
        self.truncation = truncation
        self.occlusion  = occlusion
        self.obs_angle  = obs_angle
        self.x1         = x1
        self.y1         = y1
        self.x2         = x2
        self.y2         = y2
        self.w          = w
        self.h          = h
        self.l          = l
        self.x          = x
        self.y          = y
        self.z          = z
        self.ry         = ry
        self.score      = score
        self.ignored    = False
        self.valid      = False
        self.tracker    = -1
        self.points     = points

    def __str__(self):
        """
            Print read data.
        """

        attrs = vars(self)
        return '\n'.join("%s: %s" % item for item in attrs.items())

def load_4dmot_label(path, current_frame_number,first_frame_number):
    with open(path, 'r') as f:
        lines = f.readlines()
    object_list = []
    for line in lines:
        line = line.strip()
        fields = line.split(" ")
        t_data  = tData()
        t_data.frame = int(current_frame_number.strip()) - first_frame_number # frame
        t_data.obj_type = fields[0].lower()  # object type [car, pedestrian, cyclist, ...]
        t_data.truncation = int(float(fields[1]))  # truncation [-1,0,1,2]
        t_data.occlusion = int(float(fields[2]))  # occlusion  [-1,0,1,2]
        t_data.obs_angle = float(fields[3])  # observation angle [rad]
        t_data.score = float(fields[4])
        # if t_data.score < 0.1:
        #     continue
        t_data.track_id = int(float(fields[5]))  # id
        # cur_id.append(t_data.track_id)

        t_data.points = []
        for j in range(int(len(fields[6:]) / 3)):
            t_data.points.append([float(fields[6:][j * 3 + 0]), 
                                float(fields[6:][j * 3 + 1]), 
                                float(fields[6:][j * 3 + 2])])
        t_data.points = np.array(t_data.points)
        t_data.points_center = np.mean(t_data.points, axis=0)
        object_list.append(t_data)
    return object_list

def ego_comp(point, ego_motion):
    point = np.append(point, 1)
    point = ego_motion.dot(point.T).T
    return point[:3]

def plot_point_cloud(lidar_pc, fig=None, ax=None):
    lidar_pc = lidar_pc[lidar_pc[:, 0] > 0]
    lidar_pc = lidar_pc[lidar_pc[:, 0] < 40]
    lidar_pc = lidar_pc[lidar_pc[:, 1] < 20]
    lidar_pc = lidar_pc[lidar_pc[:, 1] > -20]
    lidar_pc = lidar_pc[lidar_pc[:, 2] < 20]

    # sc = ax.scatter(-lidar_pc[:, 1], lidar_pc[:, 0], c=lidar_pc[:, 2], cmap='viridis', s=25)
    sc = ax.scatter(-lidar_pc[:, 1], lidar_pc[:, 0], s=25)
    ax.set_xlabel('X dimension')
    ax.set_ylabel('Y dimension')
    ax.set_title('Point Cloud Data - Bird\'s Eye View')
    ax.set_xlim([-20, 20])
    ax.set_ylim([0, 40])
    ax.set_autoscalex_on(False)
    ax.set_autoscaley_on(False)
    ax.grid(True)
    ax.axis('equal')
    return fig, ax

# val_seq = ['delft_1','delft_10','delft_14','delft_22']
# val_seq = ['delft_14','delft_22']
val_seq = ['delft_1']
for val in val_seq:
    output_path = 'temp_cluster_output/%s' % val
    # Check if the path exists
    if not os.path.exists(output_path):
        # Create the directory
        os.makedirs(output_path)
    frame_list_path = './clips/' + val + '.txt'
    with open(frame_list_path, 'r') as f:
        frame_list = f.readlines()
    start_frame = int(frame_list[0].strip())
    obj_id_trajectory = dict()
    start_frame = 4
    last_frame_data = FrameDataLoader(kitti_locations=kitti_locations,frame_number=frame_list[start_frame-1].strip())
    for frame in frame_list[start_frame:]:
        # use ego motion to update past object location
        current_frame_data = FrameDataLoader(kitti_locations=kitti_locations,frame_number=frame.strip())
        transforms = FrameTransformMatrix(current_frame_data)
        next_frame_transforms = FrameTransformMatrix(last_frame_data)
        t_lidar_to_odom_current_frame = np.dot(transforms.t_odom_camera, transforms.t_camera_lidar)
        t_lidar_to_odom_next_frame = np.dot(next_frame_transforms.t_odom_camera, next_frame_transforms.t_camera_lidar)
        ego_motion = np.dot(np.linalg.inv(t_lidar_to_odom_current_frame), t_lidar_to_odom_next_frame) 
        for k,v in obj_id_trajectory.items():
            obj_id_trajectory[k] = [ego_comp(trajectory_point, ego_motion) for trajectory_point in v]

        # read current frame tracking results
        model_output_path = './4dmot_runthis/' + val + '/' + frame.strip() + '.txt'
        try:
            object_list = load_4dmot_label(model_output_path, frame.strip(), start_frame)
        except FileNotFoundError:
            print('FileNotFoundError: %s' % model_output_path)
            continue
        current_frame_object_id = []
        for object in object_list:
            if object.track_id not in obj_id_trajectory:
                obj_id_trajectory[object.track_id] = [object.points_center]
            else:
                obj_id_trajectory[object.track_id].append(object.points_center)
            current_frame_object_id.append(object.track_id)

        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 8))
        vis2d = Visualization2D(current_frame_data, classes_visualized = ['Cyclist', 'Pedestrian', 'Car', 'ride_other', 'truck'])
        vis2d.draw_plot(ax=ax1, show_gt=True, show_radar=True, max_distance_threshold=40.0, fig_title='RGB and GT detection', moving_obj_only=True)

        # Set the background color to white
        fig.patch.set_facecolor('white')
        ax2.set_facecolor('white')

        # display point cloud
        lidar_pc = current_frame_data.get_radar_scan()
        fig, ax = plot_point_cloud(lidar_pc, fig, ax2)

        # plot and object trajectory
        # for k, v in obj_id_trajectory.items():
        #     if k in current_frame_object_id:
        #         v = np.array(v)
        #         if len(v) > 2:  # Ensure we have more than 2 points to sample from
        #             step_size = max(len(v) // 5, 1)  # Calculate step size for uniform sampling
        #             sample_indices = np.arange(0, len(v), step_size)  # Uniformly sample indices
        #             
        #             # Ensure the last point is included
        #             if len(v) - 1 not in sample_indices:
        #                 sample_indices = np.append(sample_indices, len(v) - 1)
        #             
        #             sampled_v = v[sample_indices]
        #             
        #             ax2.plot(-sampled_v[:, 1], sampled_v[:, 0], color='green', linewidth=2, label='object ' + str(k))
        #             ax2.scatter(-sampled_v[:, 1], sampled_v[:, 0], color='green', s=25.0)  # Mark sampled points in blue

        # plot object point cloud
        colors = ['red', 'green', 'blue', 'yellow', 'purple', 'orange']
        for i, object in enumerate(object_list):
            obj_points_radar = np.array(object.points)
            color = colors[i % len(colors)]  # Cycle through the color list
            ax2.scatter(-obj_points_radar[:, 1], obj_points_radar[:, 0], c=color, s=30)
            plt.text(-object.points_center[1], object.points_center[0], str(object.track_id), alpha=0.7, size=30)

        plt.tight_layout()
        print('Saving/%s.png' % frame.strip())
        plt.savefig(output_path + '/%s.png' % frame.strip())
        plt.close()
        last_frame_data = current_frame_data

geun0196 commented 1 week ago

Hello, thanks for sharing this amazing work. I have some questions about the trained weight and evaluation results.

Can I use the trained weights from checkpoint/track4d_radar/models/model.last.t7 to directly infer and evaluate on the VOD dataset? When I used the weight from it for direct inference on the VOD dataset, I noticed that the results of motion segmentation and clustering (after removing clusters with less than 5 points) were quite noisy in some frames.

Frame around 410

Frame around 4729

Would it be possible to share the code that can visualize the result in the same way as shown in the demo video? (Include the tracking path and filter out the moving objects less than 5 points).

Hello, The environment setup for the project has been completed. I have two questions below, and I would appreciate your answers

How did you visualize the image used for the question?
How can I verify if the demo video visualization code provided by @MrTooOldDriver is working correctly?

LiawWun commented 5 days ago

@geun0196

I visualize the image using the matplotlib library in Python.
There are some issues in the provided code due to certain functions used in it, but the visualization code gives a straightforward pipeline on how to deal with the resulting data.

geun0196 commented 4 days ago

@LiawWun

Thank you friend! Could you mind if I request your visualization code?

LJacksonPan / RaTrack

About trained weight and evaluation result #8