bertjiazheng / Structured3D

[ECCV'20] Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
https://structured3d-dataset.org
MIT License
541 stars 62 forks source link

Issue with bbox #44

Open madaan-nikhil opened 1 year ago

madaan-nikhil commented 1 year ago

I encountered an issue with the bbox coordinates provided in the dataset. Specifically, when attempting to plot the bbox and point cloud for scene_00000 and room_id 485142, I noticed discrepancies. The point cloud was generated using the method outlined in the link (https://github.com/bertjiazheng/Structured3D/issues/9#issuecomment-610730836), and to better visualize the bbox coordinates, I reduced the opacity of the point cloud and marked the bbox with red points.

Screen Shot 2023-07-30 at 1 54 20 AM

madaan-nikhil commented 1 year ago

@bertjiazheng can you please help with this?

bertjiazheng commented 1 year ago

Hi,

Sorry for the late reply.

It is possible that the coordinate system for the point cloud and bounding box annotations may differ. The point cloud is derived from the RGB-D image and has the camera center as the center of the coordinate system. On the other hand, the bounding boxes are labeled in the world system. To align the point cloud with the world coordinate system, you can add the position of the camera.

Best, Jia

madaan-nikhil commented 1 year ago

Hi, Thanks for your reply. I am adding the camera centre to make sure that pnt cld is in world coord system (Image shared above is after adding the camera centre). Here, is the method I am using for generating the per room pnt cld.

def generate_per_room_point_cloud(self, random_level=0, color=False, normal=False):

        normals = []
        points = {}
        points['colors'] = []
        points['coords'] = []
        # Getting Coordinates
        for i in range(len(self.depth_paths)):
            try:
                coords = []
                print(self.depth_paths[i])
                depth_img = cv2.imread(self.depth_paths[i], cv2.IMREAD_ANYDEPTH)
                x_tick = 180.0/depth_img.shape[0]
                y_tick = 360.0/depth_img.shape[1]
                for x in range(depth_img.shape[0]):
                    for y in range(depth_img.shape[1]):
                        # need 90 - -09
                        alpha = 90 - (x * x_tick)
                        beta = y * y_tick -180
                        depth = depth_img[x,y] + np.random.random()*random_level
                        z_offset = depth*np.sin(np.deg2rad(alpha))
                        xy_offset = depth*np.cos(np.deg2rad(alpha))
                        x_offset = xy_offset * np.sin(np.deg2rad(beta))
                        y_offset = xy_offset * np.cos(np.deg2rad(beta))
                        point = np.asarray([x_offset, y_offset, z_offset])
                        coords.append(point + self.camera_centers[i])
                points['coords'].append(np.asarray(coords))
            except:
                file1 = open("logs.txt", "a")  # append mode
                file1.write(f"{self.depth_paths[i]} missing \n")
                file1.close()

        if color:
            # Getting RGB color
            for i in range(len(self.rgb_paths)):
                try:
                    colors = []
                    # print(self.rgb_paths[i])
                    rgb_img = cv2.imread(self.rgb_paths[i])
                    for x in range(rgb_img.shape[0]):
                        for y in range(rgb_img.shape[1]):
                            colors.append(rgb_img[x, y])
                    points['colors'].append(np.asarray(colors)/255.0)
                except:
                    file1 = open("logs.txt", "a")  # append mode
                    file1.write(f"{self.rgb_paths[i]} missing \n")
                    file1.close()

        if normal:
            # Getting Normal
            for i in range(len(self.normal_paths)):
                # print(self.normal_paths[i])
                normal_img = cv2.imread(self.normal_paths[i])
                for x in range(normal_img.shape[0]):
                    for y in range(normal_img.shape[1]):
                        normals.append(normalize(normal_img[x, y].reshape(-1, 1)).ravel())
            points['normals'] = normals
        return points