facebookresearch / vggsfm

VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Other
839 stars 53 forks source link

Out of index error for own data with video_runner #65

Open albb20 opened 2 days ago

albb20 commented 2 days ago

Hi, thanks for the great work. I'm currently trying to reconstruct the poses of a camera filming a small object moving in front of the camera with the background being masked out, so that i get the relative motion between the camera and that object. I have 8 images and I use a window size of 8 and 8 query frames. After convergence, i get the following error message:

I20240917 09:10:54.316624 130612008474432 bundle_adjustment.cc:866] Bundle adjustment report: Residuals : 224788 Parameters : 46510 Iterations : 34 Time : 2.43826 [s] Initial cost : 0.168874 [px] Final cost : 0.16884 [px] Termination : Convergence

I20240917 09:10:54.316646 130612008474432 timer.cc:91] Elapsed time: 0.041 [minutes] Finished iterative BA 9 Error executing job with overrides: [] Traceback (most recent call last): File "/media/albert/Volume/Git/vggsfm/video_demo.py", line 74, in demo_fn predictions = vggsfm_runner.run( File "/media/albert/Volume/Git/vggsfm/vggsfm/runners/video_runner.py", line 154, in run self.convert_pred_to_point_frame_dict(init_pred, start_idx, end_idx) File "/media/albert/Volume/Git/vggsfm/vggsfm/runners/video_runner.py", line 385, in convert_pred_to_point_frame_dict self.frame_dict[frame_idx]["extri"] = extrinsics[relative_frame_idx] IndexError: index 0 is out of bounds for dimension 0 with size 0

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried to look into it, but I'm not really sure what causes the error. Do you have an idea? Also, is it possible to control the indices of the query directly?

Thanks for your help.

jytime commented 1 day ago

Hey it seems all the frames are filtered out after init_pred = self.process_initial_window, so there is 0 extrinsics in init_pred. This is usually because the init window is not well-conditioned.

If I understand correctly, do you mean you have 8 images in total? The video_runner uses init_window_size=32 as default. This may be a source of the problem. And, I would strongly suggest to use sparse reconstruction (demo.py instead of video_demo.py) when you have such a small number of frames. Sparse reconstruction will generally work much better in such a scenario. Only consider using video_runner when you have >100 frames.

One more thing needs to be noticed is, you may start from without masking the background. Generally background will help SfM methods.

albb20 commented 1 day ago

Hi, thanks for the fast reply. I changed to init_window to 8 before reconstruction. The reason i wanted to use video-reconstruction was because I thought the reconstructed poses would be more accurate if the algorithm gets additional information about temporal correlation.

A bit more about my problem: I film a falling object with static cameras and want to reconstruct the camera poses relative to that object, so that I can use the poses and images for a reconstruction algorithm to obtain the shape of the object. Since the background is static (as the cameras are), I need to remove it to get the cam-to-object transformation. I know that traditional SfM algorithms rely on feature-rich scenes, which I do not have. Therefore I wanted to try your work to see if it can handle the feature-poorness of my scene better. See also the images attached.

Thanks for your help

P.S. I did not want to close the issue, it was a mistake

0000 0001 0002 0003 0004 0005 0006 0007

jytime commented 1 day ago

Hi @albb20 , This seems a simple example. If you run

python demo.py SCENE_DIR=/YOUR/DIR gr_visualize=True visual_tracks=True shared_camera=True query_frame_num=8 camera_type=SIMPLE_RADIAL center_order=False

it will give you something like this:

Screenshot 2024-09-17 at 15 28 40

We can see there are some noisy black points at the boundary of the stone, which is due to the black background. If you wanna a higher quality, you can also pass the masks as masks/FILENAME to the model, telling it the background is "background". This can be done similar to images/FILENAME, please check Readme for more details about masks.

Here is a public link which you can access within 3 hours, which shows you the result in 3D (or you can run the command above to generate this by yourself):

https://9392135cb82118b5ad.gradio.live

Edit:

Just curious, according to the tracking visualisation below, it seems the model thinks there are two more stuffs falling together with the stone. Is this correct or not?

https://github.com/user-attachments/assets/a73cb627-5dee-4acf-bbdf-6976077b17ef

albb20 commented 1 day ago

Hi, thanks for your help. I managed to reproduce your results. Yes, this a relatively simple example, but most of my examples are harder with smaller objects and more rotation. I will see if those are able to be reconstructed also. Which parameters except maybe the query points do you recommend to adjust if the objects get much smaller? And what exactly does center_order do?

Regarding your question: No, there should not be anything except the particle falling. Those artefacts seem strange. However, I will try it with masks to see what happens then.

Much appreciation for your time.

EDIT: With masks, we now get the expected feature points/tracks on the object only:

image

image

albb20 commented 23 hours ago

In a much harder case, the tracking and therefore the BA fails, even with masks. I'm not sure if such small (and not really sharply captured) objects can still be reconstructed properly. I provided the images and masks, if you have the time to try it yourself.

0004

https://github.com/user-attachments/assets/1b796228-0849-4e3d-89d2-9d9279d8c670 small_object.zip

jytime commented 13 hours ago

Hey I see. This is because the object takes a too small proportion of the images, which affects the tracking accuracy. Can you share the original images without being masked? I think using some support pixels from background can help tracking. We can filter these support points out after tracking so will not reconstruct them.