facebookresearch / vggsfm

VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Other
905 stars 68 forks source link

Out of index error for own data with video_runner #65

Open albb20 opened 1 month ago

albb20 commented 1 month ago

Hi, thanks for the great work. I'm currently trying to reconstruct the poses of a camera filming a small object moving in front of the camera with the background being masked out, so that i get the relative motion between the camera and that object. I have 8 images and I use a window size of 8 and 8 query frames. After convergence, i get the following error message:

I20240917 09:10:54.316624 130612008474432 bundle_adjustment.cc:866] Bundle adjustment report: Residuals : 224788 Parameters : 46510 Iterations : 34 Time : 2.43826 [s] Initial cost : 0.168874 [px] Final cost : 0.16884 [px] Termination : Convergence

I20240917 09:10:54.316646 130612008474432 timer.cc:91] Elapsed time: 0.041 [minutes] Finished iterative BA 9 Error executing job with overrides: [] Traceback (most recent call last): File "/media/albert/Volume/Git/vggsfm/video_demo.py", line 74, in demo_fn predictions = vggsfm_runner.run( File "/media/albert/Volume/Git/vggsfm/vggsfm/runners/video_runner.py", line 154, in run self.convert_pred_to_point_frame_dict(init_pred, start_idx, end_idx) File "/media/albert/Volume/Git/vggsfm/vggsfm/runners/video_runner.py", line 385, in convert_pred_to_point_frame_dict self.frame_dict[frame_idx]["extri"] = extrinsics[relative_frame_idx] IndexError: index 0 is out of bounds for dimension 0 with size 0

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried to look into it, but I'm not really sure what causes the error. Do you have an idea? Also, is it possible to control the indices of the query directly?

Thanks for your help.

jytime commented 1 month ago

Hey it seems all the frames are filtered out after init_pred = self.process_initial_window, so there is 0 extrinsics in init_pred. This is usually because the init window is not well-conditioned.

If I understand correctly, do you mean you have 8 images in total? The video_runner uses init_window_size=32 as default. This may be a source of the problem. And, I would strongly suggest to use sparse reconstruction (demo.py instead of video_demo.py) when you have such a small number of frames. Sparse reconstruction will generally work much better in such a scenario. Only consider using video_runner when you have >100 frames.

One more thing needs to be noticed is, you may start from without masking the background. Generally background will help SfM methods.

albb20 commented 1 month ago

Hi, thanks for the fast reply. I changed to init_window to 8 before reconstruction. The reason i wanted to use video-reconstruction was because I thought the reconstructed poses would be more accurate if the algorithm gets additional information about temporal correlation.

A bit more about my problem: I film a falling object with static cameras and want to reconstruct the camera poses relative to that object, so that I can use the poses and images for a reconstruction algorithm to obtain the shape of the object. Since the background is static (as the cameras are), I need to remove it to get the cam-to-object transformation. I know that traditional SfM algorithms rely on feature-rich scenes, which I do not have. Therefore I wanted to try your work to see if it can handle the feature-poorness of my scene better. See also the images attached.

Thanks for your help

P.S. I did not want to close the issue, it was a mistake

0000 0001 0002 0003 0004 0005 0006 0007

jytime commented 1 month ago

Hi @albb20 , This seems a simple example. If you run

python demo.py SCENE_DIR=/YOUR/DIR gr_visualize=True visual_tracks=True shared_camera=True query_frame_num=8 camera_type=SIMPLE_RADIAL center_order=False

it will give you something like this:

Screenshot 2024-09-17 at 15 28 40

We can see there are some noisy black points at the boundary of the stone, which is due to the black background. If you wanna a higher quality, you can also pass the masks as masks/FILENAME to the model, telling it the background is "background". This can be done similar to images/FILENAME, please check Readme for more details about masks.

Here is a public link which you can access within 3 hours, which shows you the result in 3D (or you can run the command above to generate this by yourself):

https://9392135cb82118b5ad.gradio.live

Edit:

Just curious, according to the tracking visualisation below, it seems the model thinks there are two more stuffs falling together with the stone. Is this correct or not?

https://github.com/user-attachments/assets/a73cb627-5dee-4acf-bbdf-6976077b17ef

albb20 commented 1 month ago

Hi, thanks for your help. I managed to reproduce your results. Yes, this a relatively simple example, but most of my examples are harder with smaller objects and more rotation. I will see if those are able to be reconstructed also. Which parameters except maybe the query points do you recommend to adjust if the objects get much smaller? And what exactly does center_order do?

Regarding your question: No, there should not be anything except the particle falling. Those artefacts seem strange. However, I will try it with masks to see what happens then.

Much appreciation for your time.

EDIT: With masks, we now get the expected feature points/tracks on the object only:

image

image

albb20 commented 1 month ago

In a much harder case, the tracking and therefore the BA fails, even with masks. I'm not sure if such small (and not really sharply captured) objects can still be reconstructed properly. I provided the images and masks, if you have the time to try it yourself.

0004

https://github.com/user-attachments/assets/1b796228-0849-4e3d-89d2-9d9279d8c670 small_object.zip

jytime commented 1 month ago

Hey I see. This is because the object takes a too small proportion of the images, which affects the tracking accuracy. Can you share the original images without being masked? I think using some support pixels from background can help tracking. We can filter these support points out after tracking so will not reconstruct them.

albb20 commented 1 month ago

Hey, okay, maybe its worth a try. I didn't know it could help in such a case. I attached the original images.

Thanks again for your help! original.zip

jytime commented 1 month ago

Hey, directly replacing the masked images by original images will give me this:

Screenshot 2024-09-19 at 22 14 01

https://github.com/user-attachments/assets/dc325624-fd53-438a-a2ed-504e36453cdb

It looks not bad to me.

albb20 commented 1 month ago

Hey, yes, it looks indeed not bad. The particle tracks seem to be captured quite accurate.

EDIT: Okay, my last example :D In another case, the particle is tracked correctly, but the poses are quite off. I get no error output. Do you have an idea? If I disable fine_tracking, at least the middle camera positions seem more reasonable to me. Interestingly, in this case the tracker does not use any point from the background, as it did in the example before.

harder_example.zip

jytime commented 1 month ago

Hey for some reason I missed this. Will check it when I am free

Edit:

Hey after running,

python demo.py gr_visualize=True visual_tracks=True shared_camera=True query_frame_num=8 camera_type=SIMPLE_RADIAL center_order=False SCENE_DIR=examples/no_poses/

https://github.com/user-attachments/assets/8bcbe731-cf9f-47e1-9484-2a1a7e61343c

Screenshot 2024-09-24 at 22 16 00

it seems to give me a almost-correct trajectory and track. I am not sure about the last camera but others look good.

albb20 commented 1 month ago

Hi, although I do not get the identical result, reducing the query frame num from 9 to 8 definitely helped improving the result: image

I will check if this translates to the other cases I have. Many thanks again.

jianghr-shanghaitech commented 1 week ago

Hi, thanks for the great work. I'm currently trying to reconstruct the poses of a camera filming a small object moving in front of the camera with the background being masked out, so that i get the relative motion between the camera and that object. I have 8 images and I use a window size of 8 and 8 query frames. After convergence, i get the following error message:

I20240917 09:10:54.316624 130612008474432 bundle_adjustment.cc:866] Bundle adjustment report: Residuals : 224788 Parameters : 46510 Iterations : 34 Time : 2.43826 [s] Initial cost : 0.168874 [px] Final cost : 0.16884 [px] Termination : Convergence I20240917 09:10:54.316646 130612008474432 timer.cc:91] Elapsed time: 0.041 [minutes] Finished iterative BA 9 Error executing job with overrides: [] Traceback (most recent call last): File "/media/albert/Volume/Git/vggsfm/video_demo.py", line 74, in demo_fn predictions = vggsfm_runner.run( File "/media/albert/Volume/Git/vggsfm/vggsfm/runners/video_runner.py", line 154, in run self.convert_pred_to_point_frame_dict(init_pred, start_idx, end_idx) File "/media/albert/Volume/Git/vggsfm/vggsfm/runners/video_runner.py", line 385, in convert_pred_to_point_frame_dict self.frame_dict[frame_idx]["extri"] = extrinsics[relative_frame_idx] IndexError: index 0 is out of bounds for dimension 0 with size 0 Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried to look into it, but I'm not really sure what causes the error. Do you have an idea? Also, is it possible to control the indices of the query directly?

Thanks for your help.

I met the same problem too, with ini_window of 16, window_size of 8, since I only have one 2080ti with 11G VRAM, try to process 100 images, I can process around 35 images with max_query_pts = 1024 with demo.py, really want to know how to process as much as images with 11G VRAM, thx!

jianghr-shanghaitech commented 1 week ago

By the way, I often met this: No valid frame, step back Moving window failed, trying again. (This should not happen in most cases)