PolyCam dataset - Githubissues

HengyiWang / Co-SLAM

[CVPR'23] Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM

https://hengyiwang.github.io/projects/CoSLAM.html

Apache License 2.0

415 stars 37 forks source link

PolyCam dataset #47

Closed mohcenaouadj closed 4 months ago

mohcenaouadj commented 5 months ago

Hello,

Great work, I really appreciate everything about this project.

I was wondering concerning the iphone dataset, I have an RGB-D dataset captured by PolyCam on Iphone with image data in shape of 1024768 and depth data in shape 256192.

I'm trying to use the iphone configuration with this dataset yet for some reason I receive some really bad representation and trajectory, I was wondering what are the parts of the configuration that you recommend working on, mainly I only changed the camera size and the intrinsics.

I tried also to use vis_bound script to generate the corresponding bounding box, but the output is always: TriangleMesh with 0 points and 0 triangles.

pose_254

HengyiWang commented 5 months ago

Hi @mohcenaouadj, thank you for your question. For running Co-SLAM on iPhone datasets, here are some tips:

Make sure you are using the correct intrinsic & depth scale. You can check it using https://github.com/HengyiWang/Co-SLAM/blob/main/vis_bound.ipynb. If everything is good, you are expected to see the fused scene. Based on your description, I guess you may not set the depth scale correctly (It should be 1000 or something. You need to double-check with Polycam).
If you use identity to initialize the pose, make sure to use quaternion as the rotation representation
It is suggested to upsample the depth to 512x384 to give denser supervision to the scene representation

mohcenaouadj commented 5 months ago

@HengyiWang Thank you so much for your reply, I really appreciate it.

I'm still having the same problem, I fixed the intrinsics and the depth scale, which are provided by the app itself, and double-checked using Colmap, yet even the script vis_bound isn't giving good results, although when I tried it on Replica dataset, the result was perfect, so I was wondering what maybe other parameters that can be considered responsible for this.

Vis Bound result : Capture d'écran 2024-05-24 093940

Coslam result : Capture d'écran 2024-05-24 093844

Thanks again for your help !

HengyiWang commented 5 months ago

Hi @mohcenaouadj, can you check the camera pose (c2w or w2c, if it is w2c, you need to inverse it in your dataset class) and its convention (OpenCV or OpenGL, we use GL here)

HengyiWang commented 5 months ago

Glad to hear! 7.5 should be the scale factor between RGB and depth map. For Polycam, you may want to check this repo (https://github.com/PolyCam/polyform/) to check their conventions. From: Mohcen AOUADJ ***@***.***>Date: Tuesday, May 28, 2024 at 8:08 AMTo: HengyiWang/Co-SLAM ***@***.***>Cc: wanghy ***@***.***>, Mention ***@***.***>Subject: Re: [HengyiWang/Co-SLAM] PolyCam dataset (Issue #47)Hello again @HengyiWang, I have now used StrayScanner to help me understand the project more, the results were awesome, and I really thank you for this project.What I understood from this experience is that even PolyCam intrinsics should be divided by factor, like you did with StrayScanner "7.5" ... I tried 7.5 for PolyCam and didn't give results, so I was wondering how did you know that that was the scale to divide on? is there a math formula or something?—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

mohcenaouadj commented 5 months ago

Hello @HengyiWang,

Yes, it worked just fine, I have several other questions if I may:

How does the model really use GPU memory, in some examples where I have 2000 frames, it uses only 2 GB for example, and others where I have very less frames, like 500, it uses more than 2 GB, what's really behind the use of the memory?
In the context of mesh quality, I have read in another issues thread that the voxel size is about 3 cm, I wonder which voxel were you talking about, SDF or RGB, but in all cases my interest falls into representing small objects where I need to decrease the size of the voxel, so I was wondering where exactly I should make the modification !

Thank you again.

HengyiWang commented 5 months ago

The GPU memory usage mostly depends on how many sample points you have
We have several voxel sizes in the config file. voxel_sdf and voxel_rgb are voxel sizes of sdf and color hash grid. By setting oneGrid: True, you only have one SDF feature grid. We also have voxel_eval and voxel_final: 0.03, which are the size used to extract the mesh. You can use a smaller voxel_final to extract mesh with higher resolution. In the meantime, you may want to tune the size of voxel_sdf to achieve better performance.