ducha-aiki / affnet

Code and weights for local feature affine shape estimation paper "Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability"
MIT License
265 stars 47 forks source link

Matching code #2

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello! .I wonder if there is a script or guide to use extracted features to match, just like the ones that are shown in https://github.com/ducha-aiki/affnet

ducha-aiki commented 6 years ago

Matching is can be taken from any OpenCV tutorial: you need some KDTree or brute-force matcher + RANSAC for geometry. This repo is for paper reproduciability . Alternatively, you can use https://github.com/ducha-aiki/mods for matching with "CLI detector" and "CLI descriptor". It is not fast, because mods generates synth image, saves it to hard drive, runs AffNet, AffNet save to txt, mods loads from txt, but integration of C++ ransac code with python GPU detector code is beyond my time and coding abilities right now

ghost commented 6 years ago

I'll try the OpenCV Tutorial first. Maybe later i can try the CLI detector and descriptor. I'm just testing a bunch of detectors and descriptors so it can be helpful.

Thank you!

willard-yuan commented 6 years ago

@ducha-aiki Hi ducha-aiki, I also face the problem to extract descriptor after getting the regions using AffNet.

I follow the instruction:

cd examples/hesaffnet
python hesaffnet.py img/cat.png ells-affnet.txt 2000

then I get the regions. I read the code of HardNet:

    patches = np.ndarray((n_patches, 1, 32, 32), dtype=np.float32)
    for i in range(n_patches):
        patch =  image[i*(w): (i+1)*(w), 0:w]
        patches[i,0,:,:] = cv2.resize(patch,(32,32)) / 255.
    patches -= 0.443728476019
    patches /= 0.20197947209
    bs = 128
    outs = []
    n_batches = int(n_patches / bs) + 1
    t = time.time()
    descriptors_for_net = np.zeros((len(patches), 128))
    for i in range(0, len(patches), bs):
        data_a = patches[i: i + bs, :, :, :].astype(np.float32)
        data_a = torch.from_numpy(data_a)
        if DO_CUDA:
            data_a = data_a.cuda()
        data_a = Variable(data_a)
        # compute output
        with torch.no_grad():
            out_a = model(data_a)
        descriptors_for_net[i: i + bs,:] = out_a.data.cpu().numpy().reshape(-1, 128)
    print(descriptors_for_net.shape)
    assert n_patches == descriptors_for_net.shape[0]

So after I get the regions (x y a b c), I just need to use the (x, y) as the centers to get the 32*32 patches, then I feed it to HardNet. Do I get the point right or understand it right?

By the way, I think providing the descriptors part is good, and it will be convenient for someone whose input is image, and the output is the descriptors.

ducha-aiki commented 6 years ago

@willard-yuan no, it is not right :) to get affine-normalized patches, you need to use a b c, which are real output of the AffNet. To do this, you need function https://github.com/ducha-aiki/affnet/blob/master/LAF.py#L195 I will do an example this week, but probably not today

willard-yuan commented 6 years ago

Thanks for you help. I will use it to test the performance of fisher vector.

ducha-aiki commented 6 years ago

@willard-yuan you are welcome. BTW, I like your effort to do open-source image retrieval and benchmark it in https://github.com/willard-yuan/cnn-cbir-benchmark , but, probably, it is worth to note that your results are far away from state-of-the-art, which is now 91..95 %

ducha-aiki commented 6 years ago

@willard-yuan https://github.com/ducha-aiki/affnet/blob/master/examples/hesaffnet/WBS%20demo.ipynb - here is simple demo. Please, carefully read warnings and comments there. Unfortunately, I don`t have time to incorporate important, but yet missing parts like: 1) patch orientation estimator. Probably next week. 2) proper matcher like FAISS https://github.com/facebookresearch/faiss or at least FLANN kdtree 3) RANSAC geometrical verification

For 2 and 3 you can start from https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html although its quality is not that good.

willard-yuan commented 6 years ago

@ducha-aiki Hi ducha-aiki, You are right. The MAP of some methods such as deep image retrieval, DEep Local Features are really high, more than 90%+ (MAP). Even the BOW model followed by reranking with large vocabulary can reach 80%+ (MAP). But I find deep image retrieval is very hard to train when the dataset contains various classes objects, not just buildings of oxford. Second, directly using BOW model with inverted index is very challenge on very large-scale dataset, such as billion of images, or even videos.

So I want to focus on some methods with low dimension, and training processing is not very hard.

Very thanks for your help. I'll test it on oxford building dataset soon.

willard-yuan commented 6 years ago

@ducha-aiki Would you tell me the version of Pytorch you use? I find there is no 'torch.no_grad()' in Pytorch 0.3.0.post4.

ducha-aiki commented 6 years ago

@willard-yuan current on github. You can comment out this

ducha-aiki commented 6 years ago

@willard-yuan @emsantano I have added patch canonical orientation estimation part to the demo. To activate orientation estimation, pass do_ori = True to the detection.

willard-yuan commented 6 years ago

@ducha-aiki Great, Thanks.

ducha-aiki commented 6 years ago

@emsantano @willard-yuan you might be interested in this https://github.com/ducha-aiki/mods-light-zmq It integrates pytorch AffNet and HardNet with state-of-the art C++ matching code.