Bad result in running concept graph on ARKITScene dataset

JiaxiXie commented 3 months ago

I am now trying to run concept graph on ARKITScene dataset, and encountered a few problems. I have finished watching the tutorial and successfully generated the mapping on rerun.io using Replica Dataset. Here are the changes I made for implementing concept map on ARKITScene dataset:

I created a new ARKITScene Dataset class in dataset/dataset_common.py so that when config_dict["dataset_name"] == 'arkitscene' will do initialization with ARKITScene Dataset class.
I created a dataset/dataconfigs/arkitscene/arkitscene.yaml file with following parameters
```
dataset_name: ‘arkitscene'
camera_params:
image_height: 192
image_width: 256
fx: 212.027
fy: 212.027
cx: 127.933
cy: 95.9333
png_depth_scale: 6553.5
crop_edge: 0
```
I have done tests for all the above parameters by using them to generate Open3D simulation with only rgb_images and depth_images

I created a hydra_configs/arkitscene.yaml to modify the path
I have also checked the camera’s quaternions and translations line by line, and they are all correct
I have tried the solution to "flip the sign for the Y and Z axes of the camera pose" that is mentioned by another issue with similar problem

Here is the problem: from the clip from rerun.io below that there are overlapping of items. I am really not sure what causes those. I would definitely appreciate it if someone could share some opinions and suggestion on the problem. Thanks a lot.

krrish94 commented 3 months ago

From the visualization, it looks like the camera intrinsics / extrinsics are incorrectly configured.

I haven't used ARKitScene, so I'm unsure what the camera conventions for that dataset look like. You also seem to be using the value png_depth_scale from the Replica sequences. For most other datasets, that parameter is 1000 (except for TUM RGB-D, for which I believe it is 5000).

JiaxiXie commented 3 months ago

Here is the dataset format of ARKITScene

/[scene_id]
    /[scene_id]_frames
        /lowres_depth (depth images)
        /lowres_wide (color images)
        /lowres_wide_intrinsics  (camera intrisics)
        /lowres_wide.traj (camera extrinsics)

lowres_wide(color images)

RGB images - regular uint8, 3 channel image

lowres_depth (depth images)

Depth image - uint16 png format in millimeters

lowres_wide_intrinsics (camera intrisics) is a single-line text file, space-delimited, with the following fields: width height, focal_length_x, focal_length_y, principal_point_x, principal_point_y Here is the specific intrinsic values for this dataset: 256 192 212.027 212.027 127.933 95.9333

lowres_wide.traj is a space-delimited file where each line represents a camera position at a particular timestamp

Column 1: timestamp
Columns 2-4: rotation (axis-angle representation in radians)
Columns 5-7: translation (in meter）

For camera intrinsics, I think ARKITScene 3DOD does not provide a specific value for depth_scale. However I load other parameters as stated in the file.

For extrinsics, I use the following code to print all the camera pose when running concept-graphs/conceptgraph/slam/rerun_realtime_mapping.py given stride=10 inrerun_realtime_mapping.yaml

unt_pose = dataset.poses[frame_idx]
 print(f"==============dataset poses===============\n{unt_pose}, \n{frame_idx*10+1}”)
 unt_pose = unt_pose.cpu().numpy()

Here is one example when processing line 261 in scene 40753679’s camera pose Print out is given below -

==============dataset poses===============
tensor([[-0.7846,  0.6184,  0.0439,  0.6561],
        [-0.2921, -0.3062, -0.9060,  0.9303],
        [-0.5469, -0.7237,  0.4209,  1.4225],
        [ 0.0000,  0.0000,  0.0000,  1.0000]]), 
261

The original line from lowres_wide.traj in the dataset is given below: Line 261: 6816.11971479 0.42390419235973026 1.3735961783384674 -2.116927535648726 0.65612 0.930267 1.42246 I did not see there is any error. Just wonder if that could mean that I am getting the right extrinsics

Just in case if there are any other ways to do verification, I also have .json for object annotation and .ply mesh(maybe helpful for verification?). I wonder if there are any other ways to verify. Thanks!

krrish94 commented 3 months ago

Quick comment on the visualization posted above. It seems like you are trying to use the ARKITScene dataset, but the rerun visualization seems to be from the Replica office0 sequence. Is that intentional? If not, there could be an error someplace in your config.

The preferred way to verify if the intrinsics and extrinsics are indeed correct is to visualize the 3D reconstruction produced by gradslam (running the script here on the first 100-200 frames of your dataset, with a frameskip of ~25, should suffice; you wouldn't need to run on the full sequence). It is hard for me to tell by just looking at the dataset poses whether they're accurate, since I haven't used that dataset before.

JiaxiXie commented 3 months ago

Thanks for the suggestions!

For the visualization of Replica office0, sorry for the confusion, I am just trying to show what happened to my ARKITScene visualization.
For the gradslam, I still have some questions. When I run it I got this following error. I took a look at the original Pointclouds code in gradslam repo and see that there is no attribute save_to_h5. I wonder how can I resolve this.
```
line 118, in main pointclouds.save_to_h5(dir_to_save_map)
AttributeError: 'Pointclouds' object has no attribute 'save_to_h5'
```

krrish94 commented 3 months ago

The issue with gradslam is perhaps due to using the main branch? You will need to use the conceptfusion branch (https://github.com/gradslam/gradslam/tree/conceptfusion) to access features essential for this repo.

JiaxiXie commented 3 months ago

Thanks a lot! It works. But the result is still pretty weird shown below. There are potential problems coming in mind:

When I print out the camera pose, I see that it only remains four digits after the decimal point (probably because of my way of loading the dataset object). However, I have more digits in my original dataset. I wonder if that would be a potential cause of the problem?
Also, since ARKITScene dataset does not provide a specific depth_scale parameter. I am now trying out potential numbers by my self. That would definitely be a cause. I would be very grateful if there are any suggestions? Thanks a lot!

krrish94 commented 3 months ago

IMO, printing out the camera pose isn't an issue (the rounding off happens when you print most arrays/tensors). You could try loading in double precision and then casting to float, but I don't think that'd be a major factor.

As for ARKitScenes, I've unfortunately got no experience working with it. I'd double check the intrinsics, camera poses (and axis conventions) carefully.

I looked up their documentation very quickly, and based on this description, the depth is in millimeters, so the depth scale parameter must be set to 1000.0

JiaxiXie commented 3 months ago

Thanks for the help during the week. I have resolved the problem by changing ways to load the camera pose. It really helps a lot. Also, thank you for providing such amazing code!

concept-graphs / concept-graphs

Bad result in running concept graph on ARKITScene dataset #63