If possible Real time inference and using output for segmentation?

Mio-Atse commented 2 months ago

First of all thank you for this great work.

Can this method be used for real-time slam by changing the depth model? For example, can we use fast-depth instead of metric depth, which is less overhead? In my tests, I saw that a 640x480 1.5 minute video for the test_video.py code gave a mesh in about 45 minutes on Google's T4 GPU.

Have you studied about real time inference code? If not and you think it cannot be used in real time, do you have any suggestions for a SLAM model that takes RGB input for embedded devices?

I also want to ask one more thing, I am thinking of using the slam output from this code for point cloud segmentation. Will it be a problem to output as a mesh? The model and code I am using now takes x y z r g b information as input.

Jianxff commented 2 months ago

Hi,

For the first question, it is possible to change depth model for imporving the speed (metric3D now takes near 1 sec to estimate depth for a single image, which is too slow). The modules for depth estimation and SLAM are not coupled, so just change the code for depth estimation.

For the second, as I know, DROID-SLAM is not suitable for real-time inferece because:

DROID-SLAM only calculate pose for keyframe online and fill the whole trajectory by a specific module when the sequence is fully processed in the end.
DORID-SLAM also run global optimization when the sequence is fully processed, which may change the trajectory.
DROID-SLAM keeps all frames in memory, making it hard to process large-size sequence.

You may have to solve the problems above for real-time inference.

Besides, DROID-SLAM is a deep-learning method taking lots of computing resource (more than 18GB VRAM for graphics). Embedded devices may not run successfully. It is highly recommended to use classical RGB-SLAM method based on C++ on embedded devices such as OV2SLAM and ORB-SLAM.

For the last, this repo contains codes for mesh reconstruction based on SDF. So if real-time processing is NOT required, the mesh is OK for pointcloud sampling and segmentation. BTW, the original dense pointclouds is also possible, but you should make some changes on the viewing module from DROID-SLAM to output the PCD.

Mio-Atse commented 2 months ago

Hi again,

Thanks for your kindly reply.

Jianxff / droid_metric

If possible Real time inference and using output for segmentation? #12