Closed hturki closed 2 years ago
Hi @hturki, Nice to see that you're giving it a try on larger datasets :)
First, about the data: it seems that the connectivity of your aerial images is much higher than we usually have in street-level imagery. You can see that 10%-20% of the tracks have more than 20 keypoints. This is definitely more challenging for the keypoint adjustment. To speed this up, you can switch to the topological_reference
strategy:
sfm = PixSfM(conf={"dense_features": {"use_cache": True}, "KA": {"strategy": "topological_reference"}})
This reduces the number of optimization constraints and is thus much faster (linear vs quadratic) but less accurate. It is described in Appendix B3 of the paper.
Even with >100GB of RAM, you might later run into trouble with the bundle adjustment. If this happens, I suggest to switch to the costmaps
strategy (described in Appendix C):
sfm = PixSfM(conf={
"dense_features": {"use_cache": True},
"KA": {"strategy": "topological_reference"},
"BA": {"strategy": "costmaps"},
})
These tweaks are also described in the Large-scale refinement section of the README.
Let me know how this helps, I am also curious to see how far we can push this.
Hi @Skydes - thanks for the tips! It definitely looks like KA is roughly 3-4x faster so far when using topological reference - will let you know how that goes.
In the interim, one of the keypoint adjustment phases actually finished, and as expected bundle adjustment ran out of memory. Before continuing on with bundle adjustment however, I noticed that I ended up with 4 sub-models instead of a unified one:
2022/02/14 11:05:15 pixsfm INFO] KA Time: 833130s, cost change: 0.105714 --> 0.0934429
[2022/02/14 11:05:21 hloc INFO] Creating an empty database...
[2022/02/14 11:05:22 hloc INFO] Importing images into the database...
[2022/02/14 11:10:40 hloc INFO] Importing features into the database...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1678/1678 [00:01<00:00, 926.34it/s]
[2022/02/14 11:10:43 hloc INFO] Importing matches into the database...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83900/83900 [00:48<00:00, 1718.56it/s]
[2022/02/14 11:11:40 hloc INFO] Performing geometric verification of the matches...
[2022/02/14 11:18:41 hloc INFO] Running 3D reconstruction...
[2022/02/14 13:12:05 hloc INFO] Reconstructed 4 model(s).
[2022/02/14 13:12:05 hloc INFO] Largest model is #2 with 461 images.
[2022/02/14 13:12:06 hloc INFO] Reconstruction statistics:
Reconstruction:
num_reg_images = 461
num_cameras = 1
num_points3D = 198133
num_observations = 1459604
mean_track_length = 7.36679
mean_observations_per_image = 3166.17
mean_reprojection_error = 1.01235
num_input_images = 1678
What's more is that the largest model referenced in the message (2) seems to be missing:
/home/ubuntu/rubble-pixsfm/ref/:
hloc refined_keypoints.h5 s2dnet_featuremaps_sparse.h5
/home/ubuntu/rubble-pixsfm/ref/hloc:
cameras.bin database.db images.bin merge1 models points3D.bin
/home/ubuntu/rubble-pixsfm/ref/hloc/merge1:
cameras.bin images.bin points3D.bin
/home/ubuntu/rubble-pixsfm/ref/hloc/models:
0 1 2 3
/home/ubuntu/rubble-pixsfm/ref/hloc/models/0:
cameras.bin images.bin points3D.bin
/home/ubuntu/rubble-pixsfm/ref/hloc/models/1:
cameras.bin images.bin points3D.bin
/home/ubuntu/rubble-pixsfm/ref/hloc/models/2: # empty
/home/ubuntu/rubble-pixsfm/ref/hloc/models/3:
cameras.bin images.bin points3D.bin
I tried using colmap's model merger (https://colmap.github.io/faq.html#merge-disconnected-models) to merge the models together before running bundle adjustment, but the process failed:
(nerf_pl) ubuntu@cloudlet021:~$ colmap model_merger --input_path1 /home/ubuntu/rubble-pixsfm/ref/hloc/models/0 --input_path2 /home/ubuntu/rubble-pixsfm/ref/hloc/models/1 --output_path /home/ubuntu/rubble-pixsfm/ref/hloc/merge1
Reconstruction 1
----------------
Images: 402
Points: 169521
Reconstruction 2
----------------
Images: 402
Points: 166642
Merging reconstructions
-----------------------
=> Merge failed
(nerf_pl) ubuntu@cloudlet021:~$ colmap model_merger --input_path1 /home/ubuntu/rubble-pixsfm/ref/hloc/models/0 --input_path2 /home/ubuntu/rubble-pixsfm/ref/hloc/models/3 --output_path /home/ubuntu/rubble-pixsfm/ref/hloc/merge1
Reconstruction 1
----------------
Images: 402
Points: 169521
Reconstruction 2
----------------
Images: 413
Points: 167221
Merging reconstructions
-----------------------
=> Merge failed
FWIW opening the individual models in Colmap seems to provide sensible results, and they seem to cover the same area, so I'd assume that there is overlap between the models. Do you know what might be happening / have you run into this issue with pixsfm/hloc in the past?
Hi @hturki,
the low runtimes were probably also related to unnecessary copies during parallel keypoint adjustment. We pushed a fix to main
which should significantly speed up KA on your datasets. With the low_memory config, 10 cores and SSD storage, we achieve around 4000 iterations/second during KA on the Aachen v1.1 dataset (7000+ images) with superpoint+superglue features. On large datasets like yours, we expect less than 1sec/image during KA on standard machines.
To further speed up the refinement, I suggest setting KA.max_kps_per_problem=1000
to reduce threading- and synchronization overhead. You can also reduce the patch size to 10x10 or 8x8 to further reduce I/O time.
Please let us know if this improves your runtimes.
Hi @Phil26AT - as an update I can confirm that the latest commits makes keypoint adjustment much faster. The bottleneck is now currently on the colmap reconstruction before the bundle adjustment stage:
In some cases, COLMAP generates a collection of 4 disconnected models that don't seem to have any images in common (at least, that's why model_merger says that its failing). As previously mentioned, I'm using a custom matcher based on gps location (very similar to pairs_from_poses in the hloc repo), since using the exhaustive matcher seems expensive at this scale. I'll try to be include even more close matches than in my first attempts.
In other cases, the COLMAP reconstruction is just taking an extremely long time - looks like things get pretty slow with the incremental mapper after 1000 images or so? I'm trying to instead use the hierarchical mapper and will let you know how that goes.
Suggestions welcome, or else I'll keep you posted on how things go!
I managed to get a decent large (3000'ish) reconstruction from COLMAP after running keypoint adjustment and I'm now running into a segmentation fault during bundle adjustment with costmaps:
3019 mapping images
[2022/02/16 15:45:56 pixsfm.features.models.s2dnet INFO] Loading S2DNet checkpoint at /home/cloudlet/hturki/pixel-perfect-sfm/pixsfm/features/models/checkpoints/s2dnet_weights.pth.
[2022/02/16 15:45:59 pixsfm INFO] Loaded dense extractor with configuration:
{'cache_format': 'chunked',
'device': 'auto',
'dtype': 'half',
'fast_image_load': False,
'l2_normalize': True,
'load_cache_on_init': False,
'max_edge': 1600,
'model': {'name': 's2dnet', 'num_layers': 1, 'checkpointing': None, 'output_dim': 128, 'pretrained': 's2dnet', 'remove_pooling_layers': False, 'combine': False},
'overwrite_cache': False,
'patch_size': 16,
'pyr_scales': [1.0],
'resize': 'LANCZOS',
'sparse': True,
'use_cache': True}
[2022/02/16 15:46:50 pixsfm INFO] Loading featuremaps from H5 File.
100%[████████████████████] 3019/3019 [00:08, 339.633it/s]
[2022/02/16 15:47:00 pixsfm INFO] Extracting references and costmaps.
Segmentation fault (core dumped)
My code is roughly:
def main(hparams: Namespace) -> None:
output_path = Path(hparams.output_path)
output_path.mkdir(exist_ok=True, parents=True)
images_path = Path(hparams.images_path)
images = sorted(images_path.iterdir())
references = [str(images[i].relative_to(images_path)) for i in range(0, len(images), hparams.train_every)]
print(len(references), 'mapping images')
sfm = PixSfM(conf={"dense_features": {"use_cache": True},
'KA': {'dense_features': {'use_cache': True}, 'max_kps_per_problem': 1000},
'BA': {'strategy': 'costmaps'}})
ref_dir = output_path / 'ref'
reconstruction, ba_data, _ = sfm.refine_reconstruction(ref_dir, hparams.model_path, images_path)
print('Refined', reconstruction.summary(), ba_data)
Note that the segfault doesn't seem to occur (at least immediately) when not using costmaps, but the memory constraints are prohibitive:
3019 mapping images
[2022/02/16 15:36:12 pixsfm.features.models.s2dnet INFO] Loading S2DNet checkpoint at /home/cloudlet/hturki/pixel-perfect-sfm/pixsfm/features/models/checkpoints/s2dnet_weights.pth.
[2022/02/16 15:36:15 pixsfm INFO] Loaded dense extractor with configuration:
{'cache_format': 'chunked',
'device': 'auto',
'dtype': 'half',
'fast_image_load': False,
'l2_normalize': True,
'load_cache_on_init': False,
'max_edge': 1600,
'model': {'name': 's2dnet', 'num_layers': 1, 'checkpointing': None, 'output_dim': 128, 'pretrained': 's2dnet', 'remove_pooling_layers': False, 'combine': False},
'overwrite_cache': False,
'patch_size': 16,
'pyr_scales': [1.0],
'resize': 'LANCZOS',
'sparse': True,
'use_cache': True}
[2022/02/16 15:37:06 pixsfm INFO] Loading featuremaps from H5 File.
100%[████████████████████] 3019/3019 [00:08, 340.360it/s]
[2022/02/16 15:37:15 pixsfm INFO] Loading patches from H5 File.
Killed ] 901006/8833782 [08:19, 1803.99it/s]
Hi @hturki,
Non-merged COLMAP reconstructions can have multiple reasons. Often small viewpoint overlap or too sparse feature matching can cause this. I suggest posting this issue in the COLMAP repo.
I managed to get a decent large (3000'ish) reconstruction from COLMAP after running keypoint adjustment and I'm now running into a segmentation fault during bundle adjustment with costmaps:
Thanks for reporting this again. On some systems a missing GIL caused a segfault during costmap extraction (#21), we pushed a fix to fix-costmaps
.
Please let us know if this solves your problem with large-scale refinements.
Hey folks, things seem to work end-to-end now! I managed to get a unified colmap model by having my custom matcher match within a wider window. At this point, the main bottleneck seems to be the COLMAP 3d reconstruction, which took roughly 2 days for a 2500-image model. But I'm guessing there's not much to be done there on the pixsfm side.
Thanks again for all of the help!
First of all, thanks for the great work. In the paper and project page the authors mention scaling PixSfM to thousands of images. I'm however running into issues doing the same with datasets ranging in the 1600-3000 image range. Two of the datasets in question can be found at: https://meganerf.cmusatyalab.org/#data
My code looks roughly like:
And in particular the keypoint adjustment phase seems to be taking a very long time. Some examples so far:
Another example seems to be getting slower over time:
And same for a third example
The specs for the machines across each of these runs varies a bit, but they're pretty powerful machines (>100GB of ram, dozens of cores, SSD mounts)
I've also tried running PixSfM on an even larger example (matching only against the 20 closest neighbors instead of 50), where keypoint adjustment throws an exception. I've trying reducing the output dim and patch size to no avail:
Any thoughts on what I might be doing wrong, or is a week+ runtime to be expected no matter what? I've gotten suboptimal results on these (drone imagery-related) datasets using even commercial photogrammetry software, so excited to see if I can get better results with your featuremetric approach!