Closed GuoJunfu-tech closed 2 months ago
Hi @GuoJunfu-tech. Thanks for your interest in our work.
I used the scanner app and processing server provided by MultiScan to scan, collect, and annotate the real-world data. For the details, you can refer to their doc to download and deploy their tools.
For object alignment across states in the world coordinates, I did it manually for each data. Specifically, I aligned the meshes using the ICP alignment tool from Meshlab.
Hope it helps. Thanks!
Many thanks!
One more thing, how do you process the image to extract the main object? Do you use some semantic segmentation method like mask r-cnn or just manually delete the background?
To construct the GT data, I simply projected the 3D mesh onto the image to obtain the object mask for each view. This ensures consistency across views. But from my experience, the mask accuracy is not affecting the performance that much.
However, the off-the-shelf 2D segmentation models (e.g., SAM) should also work well or even better to segment the object in 2D than my way since the reconstructed mesh is not perfect.
Thank you so much! It has resolved my confusion.
Could you share the methods you have employed to extract real-world data from the Multi-Scan dataset? Furthermore, how have you aligned the world coordinates across different states? Also, I'm curious about the process you use to obtain the ground truth for your data. Thanks!