Unable to reproduce the results mentioned in your paper.

Vincentqyw commented 9 months ago

Hi, thank you for open-sourcing this fantastic algorithm. I have been using your open-source code to train ORB+Boost-B and then evaluate this algorithm using the hpatches dataset. However, I found that the MMA and match inliers are much lower than those of your publicly available ORB+Boost-B model.

Here are some results when benchmarking HPatches:

ORB+Boost-B (Your provided model)
# Features: 2955.918210 - [1608, 3013]
# Matches: Overall 1107.498148, Illumination 1151.550000, Viewpoint 1066.592857
# MMA@1: Overall 0.223414, Illumination 0.290804, Viewpoint 0.160838
# inliers@1: Overall 281.898148, Illumination 386.300000, Viewpoint 184.953571

ORB+Boost-B (My reproduced model) 
# Features: 2955.927469 - [1608, 3014]
# Matches: Overall 681.624074, Illumination 679.226923, Viewpoint 683.850000
# MMA@1: Overall 0.051507, Illumination 0.076662, Viewpoint 0.028148
# inliers@1: Overall 41.296296, Illumination 62.273077, Viewpoint 21.817857

and here are some logs when training ORB+Boost-B:

iteration boost loss is always 0
matching loss: 0.9405 -> 0.2208
training loss: 0.9405 -> 0.2208
train mean loss: 0.8339 -> 0.329

I confirmed that the experimental configuration has been kept the same as the parameters you provided in the open-source code. Could you please explain the reason for this? Thank you for your response!

Antu3heng commented 9 months ago

@Vincentqyw The iteration boost loss should not be 0. The plot of iteration boost loss should look like this fig:

And for the matching loss of ORB-based boosted descriptor in MegaDepth dataset, the training loss won't be so small. Maybe there is something wrong in your GT. Could you please show me your whole entire training process？ And which training set was used MegaDepth or COCO? Which training way did you choose?

To be clear: the models in this repo were trained by using MegaDepth dataset and the way training while running feature extraction.

Vincentqyw commented 9 months ago

Here are the training settings:

I followed the step in 2training-with-pre-extracted-local-feature, extract orb features and then training using these feats.
Datasets: Megadepth

Network configs:

ORB+Boost-B:
keypoint_dim: 4
keypoint_encoder: [32, 64, 128, 256]
descriptor_dim: 256
descriptor_encoder: [512, 256]
Attentional_layers: 4
last_activation: 'tanh'
l2_normalization: false
output_dim: 256

Training params are default values in train_pre.py

Vincentqyw commented 9 months ago

Another question about generate_read_function in HPatches-Sequences-Matching-Benchmark.ipynb.

def generate_read_function(method, extension='ppm', type='float'):
    def read_function(seq_name, im_idx):
        aux = np.load(os.path.join(dataset_path, seq_name, '%d.%s.%s' % (im_idx, extension, method)))
        if type == 'float':
            return aux['keypoints'], aux['descriptors']
        else:
            descriptors = np.unpackbits(aux['descriptors'], axis=1, bitorder='little')
            descriptors = descriptors * 2.0 - 1.0    # <---- THIS LINE
            return aux['keypoints'], descriptors
    return read_function

If the descriptor type is binary, the loaded descriptors are mapped to {-1,1} instead of {0,1} according to the function above. However, I believe it should not be classified as a typical binary descriptor. My question is, why did you add this operation?

Antu3heng commented 9 months ago

Binary descriptors are typically matched using Hamming distance. Hamming distance usually refers to the number of differing bits between two binary sequences which can be easily obtained by an XOR operation.

Both ORB and the binary descriptors in our paper are stored using 32 bytes, which is equivalent to 256 bits. To better utilize the GPU for parallel computation of Hamming distance (and to make it differentiable for training), we unpack these 256 bits and store them using float type during training and evaluation, mapping them from {0,1} to {-1,1}. At this point, Hamming distance can be simply calculated by the formula d_hamming = 1/2 * (256 - dot(d_i, d_j)). Besides, similar to the Euclidean distance formula after L2 normalization d_euclidean = 2 - 2 * dot(d_i, d_j), the dot(d_i, d_j) can also be used to represent the similarity between two binary descriptors by using that formula.

To be clear, the operation you mentioned is only used for training and evaluation with GPU and PyTorch, for other tests like the SLAM application, we still use the packed 32 bytes and match them using the XOR operation.

Vincentqyw commented 9 months ago

I got it! Indeed you are correct. I tried to proof Hamming distance you mentioned.

Given two binary numbers $d1$ and $d2$ of length $L$, their Hamming distance is denoted as $dist1$. We perform the following operations on these binary numbers: $d1_1 = 2 d1 - 1$ and $d2_1 = 2 d2 - 1$. The dot product of $d1_1$ and $d2_1$ is represented as $d3$.

The Hamming distance $dist1$ between two binary numbers $d1$ and $d2$ is defined as the number of positions at which the corresponding bits are different. For instance, if $d1 = 1010$ and $d2 = 1001$, then $dist1 = 2$, because the second and fourth bits are different.

For any pair of corresponding bits in $d1$ and $d2$ that are the same, i.e., $d1[i] = d2[i]$, the product after mapping is $1$, because $1 1=1$ and $-1 -1=1$.
For any pair of corresponding bits in $d1$ and $d2$ that are different, i.e., $d1[i] \neq d2[i]$, the product after mapping is $-1$, because $1*-1=-1$.
Therefore, the dot product $d3$ of $d1_1$ and $d2_1$ equals the number of same bits minus the number of different bits.
Given that the number of same bits plus the number of different bits equals the total number of bits $L$, we can infer that the number of same bits equals $L - dist1$.
Substituting this into the equation for $d3$, we get $d3 = L - dist1 - dist1 = L - 2 * dist1$.
Solving for $dist1$, we obtain $dist1 = 1/2 (L - d3)$, that is $dist1 = 1/2 (L - d1_1 * d2_1)$

SJTU-ViSYS / FeatureBooster

Unable to reproduce the results mentioned in your paper. #15