AUC comparision on IMC 2023

Solonets commented 2 months ago

Hi everyone,

First of all, great job on the paper—thank you for all your hard work!

I'm currently working on reproducing the results from Table 5, but I ran into some confusion. The table mentions that the ground truth (GT) is generated using COLMAP, but it seems like COLMAP struggles to match its own results, even on the easier datasets like the bike dataset.

For example, when I run COLMAP on 15 images from the "images" directory, as well as on "images_full," I consistently get an AUC of 100 for every tolerance gap, which is much higher than what's reported in the table. I've tried comparing these results with the reconstruction in the "sfm" folder and the poses provided in the train_labels.csv, and I'm still seeing significantly better results for COLMAP across almost every dataset.

Additionally, I noticed that while some scenes have two image sets, which makes sense for evaluating sparse versus full datasets, some scenes only have one set. In those cases, it seems like we're just comparing COLMAP against itself. I'm finding it hard to understand how results could be worse than 100 in this context.

Could you help clarify this?

ahojnnes commented 2 months ago

Hi, thank you for your interest in our work. We did not generate the GT for the IMC benchmark ourselves but use the provided one by the authors of the benchmark. To the best of our knowledge, the IMC GT was generated with COLMAP while using a hold out image set that is not available to anybody other than the benchmark creators. Thus, COLMAP was able to obtain more complete and accurate reconstructions than is possible with the imagery to the benchmark users. I hope this makes sense.

Solonets commented 2 months ago

Thank you for the clarification. Is there a chance to look at the evaluation code? because I persistently receive much better numbers for colmap, and sometimes also for glomap

lpanaf commented 2 months ago

Hi, thanks for your question. When comparing the result, we use the provided train_labels.csv as (pseudo) ground truth for haiper, heritage, and, urban, and for phototorism, we use the provided sfm model as ground truth. Notice that the ground truth is generated with more images than in images. Thus, it is natural that COLMAP scores are not 100. We always use the folder images (with few images) for comparison. As for the evaluation, we calculate the AUC scores for all image pairs (n * (n-1) / 2) thresholded at 3 degrees, 5 degrees, and 10 degrees. And the error is $max(d{geod}(R{ij}, R{ij}^{gt}), \arccos (t{ij}\cdot t_{ij}^{gt}))$. Have you included both relative rotation error and relative translation error?

mattiadurso commented 2 months ago

How is $t_{ij}$ computed and why the gt is not used for it?

lpanaf commented 2 months ago

Sorry, it was a typo. Just modified the original reply: it should be $max(d{geod}(R{ij}, R{ij}^{gt}), \arccos (t{ij}\cdot t{ij}^{gt}))$ instead of $max(d{geod}(R{ij}, R{ij}^{gt}), \arccos (t{ij}\cdot t{ij}^T))$

lpanaf commented 2 months ago

Also, notice that $ \arccos (t{ij}\cdot t{ij}^{gt})$ should be in degree instead of radian as well

mattiadurso commented 2 months ago

Sorry, it was a typo. Now it makes perfectly sense. Thanks :)

I have few more questions:

Did you run COLMAP and GLOMAP on distorted or undistorted scenes?
Did you use the automatic settings or some tweaked ones?

lpanaf commented 2 months ago

Sorry, it was a typo.

Now it makes perfectly sense. Thanks :)

I have few more questions:

Did you run COLMAP and GLOMAP on distorted or undistorted scenes?

Did you use the automatic settings or some tweaked ones?

I run on distorted scenes
I use exhaustive matching to find correspondences

mattiadurso commented 1 month ago

So you just

download the data from website
unzipped
run feature extractor (without passing intrinsics) and matcher, then
run COLMAP/GLOMAP mapper on that database

Then

for each pair of (registered) images read qvec and tvec, and compute the relative error with GT (as said above)
stored all the errors
computed the AUC with PxSfM function on the (row-wise) max(R_err, t_err) in degrees

colmap / glomap

AUC comparision on IMC 2023 #67