Sequential Mode failed on Trafalgar dataset (contains more than 15000 images)

Yzhbuaa commented 3 years ago

Hello! Thank you for your great work. I run the Sequential Mode pipeline. Both feature extraction and feature matching were successful. But I encountered a problem when running distributed_mapper:

terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted at 1597376966 (unix time) try "date -d @1597376966" if you are using GNU date PC: @ 0x7f54580dff47 gsignal
*** SIGABRT (@0x4c90) received by PID 19600 (TID 0x7f542943b700) from PID 19600; stack trace:

@     0x7f545b5c18a0 (unknown)                                                           
@     0x7f54580dff47 gsignal                                                             
@     0x7f54580e18b1 abort                                                               
@     0x7f5458ae187e (unknown)                                                           
@     0x7f5458aed486 (unknown)                                                           
@     0x7f5458aed4f1 std::terminate()                                                    
@     0x7f5458aed745 __cxa_throw                                                         
@     0x7f5458ae4037 (unknown)                                                           
@     0x5600dc12f4fa colmap::ReconstructionManager::Get()                                
@     0x5600dc19ef2d colmap::IncrementalMapperController::Run()                          
@     0x5600dc373d9c colmap::Thread::RunFunc()                                           
@     0x7f5458b18870 (unknown)                                                           
@     0x7f545b5b66db start_thread                                                        
@     0x7f54581c2a3f clone

Aborted (core dumped)
./distributed_sfm.sh: 46: ./distributed_sfm.sh: --num_images_ub=120: not found

I do not know how to solve it. It seems there is no --num_images_ub option for the distributed_mapper.

AIBluefisher commented 3 years ago

It's not need to set the --num_images_ub option. You can see from the script, this parameter is passed from terminal command. The README also gives an instruction on how to run sequential mode: ./distributed_sfm.sh $image_dir $num_images_ub $log_folder $completeness_ratio. Suppose your image directory is /home/usr_name/images, the upperbound for each cluster is 500, the log folder is log, and the completeness_ratio could be 0.5. Thus, you should run this command as ./distributed_sfm.sh /home/usr_name/images 500 log 0.5

Yzhbuaa commented 3 years ago

ChenYu, thank you for your quick reply! I see, but the readme and shell script(distributed_sfm.sh) are not correspond to each other. I guess it is because I used dev branch rather than master branch. As the dev branch is the default branch of your repo. Is there a readme for dev branch? Or should I just switch to the master branch？

AIBluefisher commented 3 years ago

The master branch has several commits behind of dev branch. The distributed_sfm.sh is not updated, but README is the newest version. It's not difficult to adjust this script. Would you like to open a pull request for updating this script?

Yzhbuaa commented 3 years ago

@AIBluefisher Yes I'd like to. But I have to run the distributed_mapper successfully on my machine before open the pull request. Thank you.

Yzhbuaa commented 3 years ago

@AIBluefisher I used the master branch to run the pipeline. Finally failed at the last step(distributed_mapper). Here is some of the output of distributed_mapper.

...

Registering image #8392 (3)

=> Image sees 169 / 2595 points => Could not register, trying another image.

Registering image #8389 (3)

=> Image sees 78 / 1690 points => Could not register, trying another image.

Registering image #8391 (3)

=> Image sees 48 / 579 points => Could not register, trying another image.

I0815 18:13:38.453012 26679 timer.cc:97] Elapsed time: 1.24232 [minutes]

Merging Clusters...

I0815 18:13:38.480891 10878 distributed_mapper_controller.cpp:259] Sub-reconstructions size: 0 F0815 18:13:38.480984 10878 sfm_aligner.cpp:49] Check failed: reconstructions.size() > 0 (0 vs. 0) Check failure stack trace: @ 0x7fa709b080cd google::LogMessage::Fail() @ 0x7fa709b09f33 google::LogMessage::SendToLog() @ 0x7fa709b07c28 google::LogMessage::Flush() @ 0x7fa709b0a999 google::LogMessageFatal::~LogMessageFatal() @ 0x558705ac4ad2 (unknown) @ 0x558705aaf4c6 (unknown) @ 0x558705ab1ab6 (unknown) @ 0x558705cb108c (unknown) @ 0x7fa705c23870 (unknown) @ 0x7fa7086c16db start_thread @ 0x7fa7052cda3f clone Aborted (core dumped)

Is there any other information I can provide to debug? Thank you!

AIBluefisher commented 3 years ago

Could you show me your parameter settings?

AIBluefisher commented 3 years ago

The reconstruction result is tightly related to the matching parameters. In my settings, I used --VocabTreeMatching.num_images=30 in vocabulary tree feature matching.

Yzhbuaa commented 3 years ago

Hello, here is my parameter settings ./distributed_sfm_dev_yzh.sh /path/to/dataset 500 log 0.5. The shell script is:

  1 DATASET_PATH=$1
  2 num_images_ub=$2
  3 log_folder=$3
  4 completeness_ratio=$4
  5 #VOC_TREE_PATH=$5                                                                                                                                                                                              
  6 # image_overlap=$3
  7 # max_num_cluster_pairs=$4
  8 mkdir "$DATASE_PATH/$log_folder"
  9
 10 echo Feature Extraction
 11 start_time_fe=$(date +%s)
 12 /home/astlab/yuzihao/GraphSfM-dev/build/src/exe/colmap feature_extractor \
 13 --database_path=$DATASET_PATH/database.db \
 14 --image_path=$DATASET_PATH/images \
 15 end_time_fe=$(date +%s)
 16 cost_fe=$((end_time_fe - start_time_fe))
 17 echo $cost_fe
 18
 19 echo feature matching
 20 start_time_fm=$(date +%s)
 21 /home/astlab/yuzihao/GraphSfM-dev/build/src/exe/colmap vocab_tree_matcher \
 22  --database_path=$DATASET_PATH/database.db \
 23  --VocabTreeMatching.num_images=30 \
 24  --VocabTreeMatching.vocab_tree_path=$DATASET_PATH/vocab_tree_flickr100K_words1M.bin
 25 end_time_fm=$(date +%s)
 26 cost_fm=$((end_time_fm - start_time_fm))
 27 echo $cost_fm
 28
 29 start_time_dm=$(date +%s)
 30 /home/astlab/yuzihao/GraphSfM-dev/build/src/exe/colmap distributed_mapper \
 31 "$DATASET_PATH/$log_folder" \
 32 --database_path=$DATASET_PATH/database.db \
 33 --transfer_images_to_server=0 \
 34 --image_path=$DATASET_PATH/images \
 35 --output_path="$DATASET_PATH/$log_folder" \
 36 --num_workers=8 \
 37 --distributed=0 \
 38 --repartition=0 \
 39 --assign_cluster_id=1 \
 40 --write_binary=1 \
 41 --retriangulate=0 \
 42 --final_ba=0 \
 43 --select_tracks_for_bundle_adjustment=1 \
 44 --long_track_length_threshold=10 \
 45 --num_images_ub=$num_images_ub \
 46 --completeness_ratio=$completeness_ratio \
 47 --relax_ratio=1.3 \
 48 --cluster_type=NCUT # spectra
 49 end_time_dm=$(date +%s)
 50 cost_dm=$((end_time_dm - start_time_dm))
 51 echo feature extraction cost:
 52 echo $cost_fe
 53 echo feature matching cost:
 54 echo $cost_fm
 55 echo distributed mapper cost:
 56 echo $cost_dm

AIBluefisher commented 3 years ago

For unordered image datasets, it's better to enlarge the num_images_ub parameter. I tried 1500 for Trafalgar dataset and con recover more than 7000 camera poses.

Yzhbuaa commented 3 years ago

Thank you, I'll enlarge the num_images_ub parameter and see if it works.

Yzhbuaa commented 3 years ago

Thank you, AIBluefisher! I run the pipeline successfully using num_images_ub=1500 and VocabTreeMatching.num_images=30. Is there any thumb of rules for the parameter settings when I use your pipeline on large dataset? I really appreciate your help.

AIBluefisher commented 3 years ago

GraphSfM is not a silver bullet. The effect sometimes depends on datasets. For datasets that have a sparse or unevenly distributed images, we should try to use larger image upperbound; For densely connected datasets, we can use smaller image upperbound. Besides, the clustering algorithm have great potential to improve, and also to improve this divide-and-conquer approach. Feel free to discuss with me if you have any idea!

Yzhbuaa commented 3 years ago

Thank you, I will read your thesis and code to learn more about GraphSfM. My question has been answered, so I close this issue now. Have a nice day!

AIBluefisher / DAGSfM

Sequential Mode failed on Trafalgar dataset (contains more than 15000 images) #25

...

Registering image #8392 (3)

Registering image #8389 (3)

Registering image #8391 (3)

Merging Clusters...