3D (Oriented) Bounding Boxes and projection on 2D rendered images

march038 commented 1 month ago

Describe your feature request

Hi everyone,

We are currently using and testing BlenderProc @Fraunhofer IOSB for setting up synthetic data creation pipelines in Blender for training visual language models and would also like to maybe collaborate with you as the dev team to implement new functions. As a team of 2 people being really new to Blender and Blenderproc we would very much welcome your collaboration as we think Blenderproc is absolutely fantastic to work with and for further development.

For our case we need segmentation masks, annotations and bounding boxes which are mostly already implemented. As we aim for 3D Object Detection and 6D Pose Estimation we miss a 3D bounding box feature. I saw that Issue #444 already asked for it and @andrewyguo offered to work on it but it seems that the feature was abandoned.

What we need is a functionality, that transforms the three-dimensional coordinates of an objects bounding box's vertices into the two-dimensional coordinates they would have in the 2-dimensional rendered image, depending on the camera perspective.

We are thinking about defining an own JSON-Format which should then get the X and Y coordinates of the bbox in the 2D image.

The format could look like: ``json { "image_id": "image_001", "objects": [ { "class": "car", "bounding_box_3d": { "vertices": [ {"x": 100, "y": 200}, {"x": 150, "y": 200}, {"x": 150, "y": 250}, {"x": 100, "y": 250}, {"x": 105, "y": 205}, {"x": 155, "y": 205}, {"x": 155, "y": 255}, {"x": 105, "y": 255} ] } } ] }

We would love to see this being implemented as we would rather leverage the awesome functions and architecture BlenderProc already has instead of trying to build something separately.

Describe a possible solution

code_base.zip We already experimented with a GPT-generated script for projecting an object's bounding box in Blender onto the rendered image without the need of Blenderproc but it seems that the transformations from the bounding box's three-dimensional world coordinates to the pixel coordinates in the 2D rendered picture has some issues as the box is not at the right spot. We also used a fixed camera instead of camera_positions as we didn't use Blenderproc for this.

The process consists of two scripts, firstly the transformation script transforms the bbox world coordinates into pixel coordinates depending on the camera perspective and write the output to a JSON-file and then a visualization script projects these pixel coordinates onto the rendered image.

At the end, the real feature could be a functionality, that either renders a second image with the 3D bounding box projected onto the image/object or an own sort of module for this functionality. I also think it can't hurt to keep a feature that exports the pixel coordinates of the bouning box in JSON-format.

I attache our code as well as an image of how the rendered image looks and how the bounding box got projected with_bbox

cornerfarmer commented 1 month ago

Hey @march038,

this should be pretty easy to do nowadays in blenderproc:

points = bproc.camera.project_points(obj.get_bound_box(), frame=0)

For a given object obj this will compute the 2D pixel positions for all bounding box corners in frame 0. Afterwards you can write this to file in any format you like. Make sure to register the camera positions beforehand via the usual bproc.camera.add_camera_pose. To actually draw the bounding boxes, you can do that in a second script outside of blenderproc, as in your code.zip. Or you can do this directly on top of the rendered images.

e.g. this will draw red points into the rgb image for each bbox corner:

data = bproc.renderer.render()
for p in points:
    data["colors"][0][int(p[1])][int(p[0])] = [255, 0, 0]

Let me know if you have any further questions

march038 commented 1 month ago

Hi @cornerfarmer and thank you for the fast reply!

I'm not too sure where to add this code in Blenderproc as I don't fully understand yet the whole workflow Blenderproc uses. We would like to extend Blenderproc with that functionality instead of doing the visualizuation as a work-around so we would like to draw the bounding box on top of the rendered image(s) as you proposed. I also think that for our case, it would be the best if we could just get and draw the bbox's vertices for all mesh objects in the scene, not just for a specific one.

As we don't want to use bboxes only for a specific module, I think it would make sense to adapt the Renderer that is called in every module. I tried to understand which part of Blenderproc could be modified for this use case and is responsible for rendering the "normal" image and thought that it might be the render function in python>renderer>RendererUtility.py

If you could explain where and how to modify BP for this , we would very much appreciate your help!

cornerfarmer commented 1 month ago

You dont need to modify the bproc code. The code I wrote simply goes into you main.py script, e.g. see https://github.com/DLR-RM/BlenderProc/blob/main/examples/basics/basic/main.py. If you want to apply this to multiple objects you can just iterate over your objects and do the project for each of them.

march038 commented 1 month ago

Thanks a lot @cornerfarmer, we really appreciate your help and support!

After thoroughly reviewing the function get_bound_box, we finally understood it. Apologies for not catching it earlier.

We extented the semantic_segmentation main.py to not only create the .hdf5 container but also write all the bbox points as lists into a text file. We realized that the function get_bound_box exports the points according to a fixed pattern which we used to implement our own bbox functionality.


# load the objects into the scene
objs = bproc.loader.load_blend(args.scene)
# print(objs)
# print(len(objs))

# define a light and set its location and energy level
light = bproc.types.Light()
light.set_type("AREA")
light.set_location([0,0,10])
light.set_energy(1000)

# define the camera intrinsics
bproc.camera.set_resolution(1920,1080)

# read the camera positions file and convert into homogeneous camera-world transformation
with open(args.camera, "r") as f:
    for line in f.readlines():
        line = [float(x) for x in line.split()]
        position, euler_rotation = line[:3], line[3:6]
        matrix_world = bproc.math.build_transformation_mat(position, euler_rotation)
        bproc.camera.add_camera_pose(matrix_world)

# activate depth rendering
bproc.renderer.enable_depth_output(activate_antialiasing=False)

# enable segmentation masks (per class and per instance)
bproc.renderer.enable_segmentation_output(map_by=["category_id", "instance", "name"])

# Initialize an empty list for the vertice's points
all_points=[]

# Iterate over all objects and check visibility
for i,obj in enumerate(objs):
    try:
        print(i)
        if isinstance(obj, bproc.types.MeshObject):
            # Get the bounding box of the object
            bbox_corners_2d = bproc.camera.project_points(obj.get_bound_box(), frame=0)
            bbox_corners_3d = obj.get_bound_box()

            # Check if any point is visible from the current camera pose
            visible_corners = []
            for corner in bbox_corners_3d:
                # Perform visibility check for each corner
                visible = bproc.camera.is_point_inside_camera_frustum(corner, frame=0)
                print(visible)
                if visible:
                    visible_corners.append(corner)

            # If at least one corner is visible, project the points to the 2D image
            if len(visible_corners)> 4:
                all_points.append(bbox_corners_2d)
    except:
        pass

# Write all visible points to a text file
output_points_file = "C:\\Beruflich\\Fraunhofer\\Forschung\\BlenderProc\\examples/basics\\semantic_segmentation\\output\\projected_points_filtered.txt"
with open(output_points_file, "w") as f:
    for obj_idx, points in enumerate(all_points):
        f.write(f"Object {obj_idx}:\n")
        for p in points:
            f.write(f"{p[0]}, {p[1]}\n")
        f.write("\n")  # Leerzeile zwischen den Objekten

# Render the scene
data = bproc.renderer.render()

# Save the rendered data to an HDF5 container
bproc.writer.write_hdf5(args.output_dir, data)

After generating the text file as well as the .hdf5 container in the 1st step, in the 2nd step, we wrote a script that saves the 'colors' / normal image from the hdf5 as a PNG and then we use a 3rd script that draws the points onto the image in different colors according to their scheme and connects the points with white lines. For establishing patterns on which points to connect, we simply looked at how the points for each bbox are distributed and then implemented this into the script manually.

This works really well, as you can see in the picture below, where a tank's parts are segmented into multiple meshes.

NEW_image_frontside_with_points_and_corners

Problem to solve Unfortunately, as you can see the current code exports the bounding boxes of all objects in the scene, even if they are hidden behind other objects from the camera's perspective. For our work, we need a way to filter out objects that are not visible from the rendered camera's perspective. For now, let's just assume that if at least 5 of the 8 vertices are visible to the camera, the object counts as visible.

Looking at our current code from above, the function is_point_inside_camera_frustum seems to only check if a point is located inside the camera's viewfield area, not if it is actually visible in the viewfield so that is not the way to go. We are looking for a way to filter out certain objects that do not meet the visibility criteria. I think we could do this via a threshold of how much bbox vertices should be visible in the viewfield like we tried in our code but using visible surface points for each mesh instead of the bbox vertices could also be an idea.

Are there any functionalities in Blenderproc that we could use for this?

P.S. : We could for sure share our code for drawing the bbox points and corners onto the image if you think that a 3D bounding box module might be something to implement in the future.

DLR-RM / BlenderProc

3D (Oriented) Bounding Boxes and projection on 2D rendered images #1150

Describe your feature request

Describe a possible solution