aimotive / mm_training

Multimodal model training on aiMotive Dataset
https://openreview.net/forum?id=LW3bRLlY-SA
MIT License
13 stars 4 forks source link

Visualize the Predictions #7

Open PaulSudarshan opened 3 months ago

PaulSudarshan commented 3 months ago

Please provide a visualization script to visualize the predicted 3D bounding box on the image. I have attached a sample image and its corresponding bounding box. Thanks F_MIDLONGRANGECAM_CL_0007242 frame_0007242.json

TamasMatuszka commented 3 months ago

Hi @PaulSudarshan,

You can visualize your predictions using the aiMotive Dataset Loader repository.

When you obtain the predicted bounding boxes after inference, you can visualize them using the example renderer: PYTHONPATH=$PYTHONPATH: python examples/example_render.py --root-dir PATH_TO_AIMOTIVE_DATA --split val

You need to update the getitem() method of /src/data_loader.py:

def __getitem__(self, path: str) -> DataItem:
        """
        Returns sensor data for a given keyframe.

        Args:
            path: path of the keyframe's annotation file

        Returns:
            a DataItem with annotations and sensor data
        """
        data_folder = self.get_directory(path)
        frame_id = self.get_frame_id(path)

        ### THIS LINE IS ADDED TO THE ORIGINAL CODE FOR VISUALIZING PREDICTIONS.
        path = "YOUR_BASE_PATH/EXPERIMENT_NAME/outputs/val" + path.split('val')[1]
        ####

        annotations = Annotation(path)
        lidar_data = load_lidar_data(data_folder, frame_id)
        radar_data = load_radar_data(data_folder, frame_id)
        camera_data = load_camera_data(data_folder, frame_id)

        return DataItem(annotations, lidar_data, radar_data, camera_data)

If you want to filter predictions by confidence, add these lines to the beginning of is_in_fov(...) method of /src/renderer.py:

if obj['Score'] < 0.2:
        return False
PaulSudarshan commented 3 months ago

Hi @TamasMatuszka , I want to know, how to visualize the 3D bounding boxes from the inference json files. Specifically I want a snippet that would parse the inference json and overlay the 3D bounding boxes on the image.

TamasMatuszka commented 3 months ago

Hi @PaulSudarshan,

aiMotive Dataset Loader repository can be used with the above-mentioned modification for visualizing the 3D bounding boxes from the inference .json files. The .json files are saved using the same directory structure as the GT is stored. Therefore, only the json path shall be updated, as I showed in my previous comment.

PaulSudarshan commented 3 months ago

@TamasMatuszka The 3D annotations present inside the directory /aimotive_dataset/val/highway/20210401-074452-00.01.00-00.01.15@Jarvis/dynamic/box/3d_body belong to which of the following cameras :

B_MIDRANGECAM_C F_MIDLONGRANGECAM_CL F_MIDLONGRANGECAM_CR M_FISHEYE_L M_FISHEYE_R

TamasMatuszka commented 3 months ago

@PaulSudarshan the annotations are defined in the body coordinate system. For more details, please refer to Section 3.2 of the paper.

For checking whether a certain annotation is visible on a certain camera, please refer to renderer.py of the aiMotive Dataset Loader repository.

PaulSudarshan commented 3 months ago

@PaulSudarshan the annotations are defined in the body coordinate system. For more details, please refer to Section 3.2 of the paper.

For checking whether a certain annotation is visible on a certain camera, please refer to renderer.py of the aiMotive Dataset Loader repository.

@TamasMatuszka it is not clear to me how the annotations of a single frame (.json) maps to multiple cameras. As I can see a single frame has a single annotated json file, how is that single annotated json file mapped to 4 different cameras.

TamasMatuszka commented 3 months ago

@PaulSudarshan The annotations are in BEV space meaning all objects around the car corresponding to a given frame are contained by a JSON file. The neural network operates and detects in BEV space, therefore, this is a natural design choice. The visibility of a given object on a camera can be calculated with the code I sent previously.

PaulSudarshan commented 3 months ago

@PaulSudarshan The annotations are in BEV space meaning all objects around the car corresponding to a given frame are contained by a JSON file. The neural network operates and detects in BEV space, therefore, this is a natural design choice. The visibility of a given object on a camera can be calculated with the code I sent previously.

Thanks for your explanation. I have another query with regards to visualizing the predictions in BEV space, does the aiMotive Dataset Loader repository provide support for visualization in BEV ?

TamasMatuszka commented 2 months ago

@PaulSudarshan Sure, renderer.py has a render_lidar() method which is used for generating plots similar to Figure 14 in the paper.

PaulSudarshan commented 2 months ago

@PaulSudarshan Sure, renderer.py has a render_lidar() method which is used for generating plots similar to Figure 14 in the paper.

Thanks @TamasMatuszka . If I want to visualize on top of an top-down camera image instead of lidar/radar, how am I supposed to get the top-down camera image that needs to be passed to the following function? image

TamasMatuszka commented 2 months ago

@PaulSudarshan If you want to visualize the images in BEV you can use IPM. Since you have 4 cameras, the projections need to be stitched which might not be trivial. Some references regarding IPM: