Camera extrinsic convention

simba611 commented 6 months ago

Hi, would you be able to explain the camera extrinsic convention ? does it follow OpenCV convention or something different ?

Youngju-Na commented 6 months ago

Hi, @simba611, thanks for your interest in our work.

Our camera extrinsic convention follows COLMAP which is basically similar to OpenCV format. You also should rescale the object of interest into a unit sphere as in dataloader.

If you are working on custom dataset, please follow VolRecon's custom dataset formatting which will also work for UFORecon.

If you have additional question, please feel free to ask.

simba611 commented 6 months ago

Hi, thank you for the amazing work and quick reply. I tried using the pretrained weights with the following scene: 000001 000016 000036

and the following camera parameters: 00000001_cam.txt 00000016_cam.txt 00000036_cam.txt

and got the following results: 00000000 00000001 00000002

There seem to be some duplication artifacts with this configuration. I am unsure if that is because of the camera parameters or maybe the scene I chose is too unfavourable for the model to work with. After your reply, I tried rescaling my extrinsics (perhaps incorrectly) to the following camera parameters: 00000001_cam.txt 00000016_cam.txt 00000036_cam.txt

and got the following results with similar artifacts: 00000000 00000001 00000002

These duplication artifacts and sharp boundaries in the prediction seem like something incorrect with my camera poses but I cannot deduce what. Any help would be appreciated.

Also, the script I use to run this is the eval_dtu_unfavourable.sh python main.py --extract_geometry --set 0 \ --volume_type "correlation" --volume_reso 96 --view_selection_type "best" \ --depth_pos_encoding --mvs_depth_guide 1 --explicit_similarity \ --test_n_view 3 --test_ray_num 800 --test_ref_view 1 16 36 \ --test_dir=$DATASET --load_ckpt=$LOAD_CKPT --out_dir=$OUT_DIR $@

simba611 commented 6 months ago

I have another question, which is perhaps related to the same topic, VolRecon says that their "method is not suitable for very large-scale scenes because of the coarse global feature volume." Are there any similar limitations with UFORecon ?

Youngju-Na commented 5 months ago

Hi, sorry for the late reply. First of all, the pre-trained weights are trained on the DTU dataset, which is a highly object-centric dataset. So, I'm concerned that the scene generalizability is not able to fully cover the larger scenes such as indoor scenes. You may want to consider fine-tuning with the given dataset.

Another factor is the number of scenes used in the reconstruction. I'm not sure how favorable the given images are for the reconstruction, but larger scenes generally require a much larger number of images. (e.g., 10+ images)

So consider enlarging the number of input images with large overlaps between images. This might alleviate the duplication artifacts.

Finally, the comment "method is not suitable for very large-scale scenes because of the coarse global feature volume." is also valid in our method because our implementation is based on VolRecon.

However, the difference is that our global feature volume is implemented in a cascaded manner and also based on rather feature correlation but not on the feature itself.

*P.S. please consider using the weights of uforecon_random.ckpt because it usually shows much higher scene and view-combination generalizability.

Sincerely, Youngju Na

Youngju-Na / UFORecon

Camera extrinsic convention #1