Open Vincentqyw opened 9 months ago
@Vincentqyw The iteration boost loss should not be 0. The plot of iteration boost loss should look like this fig:
And for the matching loss of ORB-based boosted descriptor in MegaDepth dataset, the training loss won't be so small. Maybe there is something wrong in your GT. Could you please show me your whole entire training process? And which training set was used MegaDepth or COCO? Which training way did you choose?
To be clear: the models in this repo were trained by using MegaDepth dataset and the way training while running feature extraction.
Here are the training settings:
ORB+Boost-B:
keypoint_dim: 4
keypoint_encoder: [32, 64, 128, 256]
descriptor_dim: 256
descriptor_encoder: [512, 256]
Attentional_layers: 4
last_activation: 'tanh'
l2_normalization: false
output_dim: 256
train_pre.py
Another question about generate_read_function
in HPatches-Sequences-Matching-Benchmark.ipynb.
def generate_read_function(method, extension='ppm', type='float'):
def read_function(seq_name, im_idx):
aux = np.load(os.path.join(dataset_path, seq_name, '%d.%s.%s' % (im_idx, extension, method)))
if type == 'float':
return aux['keypoints'], aux['descriptors']
else:
descriptors = np.unpackbits(aux['descriptors'], axis=1, bitorder='little')
descriptors = descriptors * 2.0 - 1.0 # <---- THIS LINE
return aux['keypoints'], descriptors
return read_function
If the descriptor type is binary, the loaded descriptors are mapped to {-1,1} instead of {0,1} according to the function above. However, I believe it should not be classified as a typical binary descriptor. My question is, why did you add this operation?
Binary descriptors are typically matched using Hamming distance. Hamming distance usually refers to the number of differing bits between two binary sequences which can be easily obtained by an XOR operation.
Both ORB and the binary descriptors in our paper are stored using 32 bytes, which is equivalent to 256 bits. To better utilize the GPU for parallel computation of Hamming distance (and to make it differentiable for training), we unpack these 256 bits and store them using float type during training and evaluation, mapping them from {0,1} to {-1,1}. At this point, Hamming distance can be simply calculated by the formula d_hamming = 1/2 * (256 - dot(d_i, d_j))
. Besides, similar to the Euclidean distance formula after L2 normalization d_euclidean = 2 - 2 * dot(d_i, d_j)
, the dot(d_i, d_j)
can also be used to represent the similarity between two binary descriptors by using that formula.
To be clear, the operation you mentioned is only used for training and evaluation with GPU and PyTorch, for other tests like the SLAM application, we still use the packed 32 bytes and match them using the XOR operation.
I got it! Indeed you are correct. I tried to proof Hamming distance
you mentioned.
Given two binary numbers $d1$ and $d2$ of length $L$, their Hamming distance is denoted as $dist1$. We perform the following operations on these binary numbers: $d1_1 = 2 d1 - 1$ and $d2_1 = 2 d2 - 1$. The dot product of $d1_1$ and $d2_1$ is represented as $d3$.
The Hamming distance $dist1$ between two binary numbers $d1$ and $d2$ is defined as the number of positions at which the corresponding bits are different. For instance, if $d1 = 1010$ and $d2 = 1001$, then $dist1 = 2$, because the second and fourth bits are different.
For any pair of corresponding bits in $d1$ and $d2$ that are the same, i.e., $d1[i] = d2[i]$, the product after mapping is $1$, because $1 1=1$ and $-1 -1=1$.
For any pair of corresponding bits in $d1$ and $d2$ that are different, i.e., $d1[i] \neq d2[i]$, the product after mapping is $-1$, because $1*-1=-1$.
Therefore, the dot product $d3$ of $d1_1$ and $d2_1$ equals the number of same bits minus the number of different bits.
Given that the number of same bits plus the number of different bits equals the total number of bits $L$, we can infer that the number of same bits equals $L - dist1$.
Substituting this into the equation for $d3$, we get $d3 = L - dist1 - dist1 = L - 2 * dist1$.
Solving for $dist1$, we obtain $dist1 = 1/2 (L - d3)$, that is $dist1 = 1/2 (L - d1_1 * d2_1)$
Hi, thank you for open-sourcing this fantastic algorithm. I have been using your open-source code to train
ORB+Boost-B
and then evaluate this algorithm using the hpatches dataset. However, I found that theMMA
andmatch inliers
are much lower than those of your publicly availableORB+Boost-B
model.Here are some results when benchmarking HPatches:
and here are some logs when training
ORB+Boost-B
:I confirmed that the experimental configuration has been kept the same as the parameters you provided in the open-source code. Could you please explain the reason for this? Thank you for your response!