cvlab-epfl / LIFT

Code release for the ECCV 2016 paper
485 stars 168 forks source link

Details about the pre-training process for detector and data extraction #13

Closed 13331151 closed 7 years ago

13331151 commented 7 years ago

Hi, Happy new year! Sorry to bother you again. I trained the descriptor model using hard sample mining, and it do work much better than before. 1.And I notice that in your paper, you said you add a 4.8% random perturbations to the location of keypoints. Do you add the perturbation since training the descriptor network or since training the detector network? 2.How do you extract the non-feature point from images? I choose those postion away from the feature points survive through the Visual SfM reconstruction instead all the sift feature points. Is it making sense? 3.How should I set up the balance factor(gamma) based on the data? I use(Pseudocode):

   accumulate_pair_loss = 0
   accumulate_class_loss = 0
   while not converged:
       pair_loss = ...
       class_loss = ...
       accumulate_pair_loss += pair_loss
       accumulate_class_loss += class_loss
       gamma = accumulate_pair_loss/accumulate_class_loss
       loss = gamma*class_loss+pair_loss
       update
       ...

4.Could you give me some hints about how to compute the second term (I guess it's the IOU?) in the overlap loss in the pre-training phase? Thanks very much!! @kmyid

kmyi commented 7 years ago

Happy new year to you too :-)

1.And I notice that in your paper, you said you add a 4.8% random perturbations to the location of keypoints. Do you add the perturbation since training the descriptor network or since training the detector network?

Well, it's 4.8sigma so in fact, it moves about 20% in terms of support regions. It's applied when the whole thing is trained, and when the detector is trained. As we assume that the keypoint detector should be able to compensate the random perturbations.

2.How do you extract the non-feature point from images? I choose those postion away from the feature points survive through the Visual SfM reconstruction instead all the sift feature points. Is it making sense?

We actually include the SIFT points as the points you should avoid when taking non-feature points. As for these points may be similar to the selected points, making the problem harder for the detector.

3.How should I set up the balance factor(gamma) based on the data?

We just simply use a hyper parameter when mixing them. We simply tried multiple with a validation set during our earlier experiments and stuck to it ever since (we did not have time to test multiple :-(). We simply do (pair_loss + 1e-8class_loss). But your mileage may vary on the selection of the parameter. Also, it is important that you balance the non-feature points and feature points for the class_loss. A trick we did was that the cost from the non feature point branch was multiplied with 0.75 whereas the other three branches had 0.25. Note that this does not sum to one, which is a mistake I did, but should not affect the overall result. You just end up with having a bit offset in the hyper parameter (it's the ratio that matters!)

4.Could you give me some hints about how to compute the second term (I guess it's the IOU?) in the overlap loss in the pre-training phase?

We do something along the lines of

    # compute intersection and union
    intersection = getIntersectionOfRectangles(x1, y1, r1, x2, y2, r2)
    union = (2.0 * r1)**2.0 + (2.0 * r2)**2.0 - intersection

    return intersection / union

where

def getIntersectionOfRectangles(x1, y1, r1, x2, y2, r2):
    # intersection computation
    inter_w = T.minimum(x1 + r1, x2 + r2) - T.maximum(x1 - r1, x2 - r2)
    inter_w = T.maximum(0, inter_w)
    inter_h = T.minimum(y1 + r1, y2 + r2) - T.maximum(y1 - r1, y2 - r2)
    inter_h = T.maximum(0, inter_h)

    return inter_w * inter_h

Hope it helps!

Kwang

13331151 commented 7 years ago

Yes, It do really help a lot! Thanks very much. A little unclear is that:

  1. when you said "the whole thing", is it meaning that you added the perturbation everywhere or you added the perturbation on the process of finetuning the detector?
  2. in the code

    intersection = getIntersectionOfRectangles(x1, y1, r1, x2, y2, r2)

    What should x1, y1, r1, x2, y2, r2 suppose to be? My understanding is the center of the keypoint patch(size is 128*128) is o(0, 0), the ground truth location of two keypoints are p1 and p2, where

    p1 = (0, 0)+perturbation1
    p2 = (0, 0)+perturbation2

    and the locations generate from the detector component are p1_out and p2_out, so that

    (x1, y1) = p1_out - p1 (x2, y2) = p2_out - p2

    Is that close to what you said? What's more, since we extract the keypoint patch based on sift scale, then the content of two corresponding keypoints should share nearly the same scale, then why would we use r1 and r2 differently? Can I simply making r1 and r2 to be 32(64/2)?

kmyi commented 7 years ago

when you said "the whole thing", is it meaning that you added the perturbation everywhere or you added the perturbation on the process of finetuning the detector?

Pre-training the detector and the finetuning. Basically everytime except the pretraining of orientation & descriptor

2

This part is a bit tricky. x1,y1 would be the center, r1 should be the radius of the support region for the descriptor. It really depends on which coordinate system you are running on. In our case, we had the patch to be -1 to 1 and zero at the center, and have r be 0.5. We also perturb the scale, so in that case, you need to adjust r as well to compensate for the scale difference between patches.

13331151 commented 7 years ago

Thanks for your patience. It's very nice of you to help me out. :+1: