andrefaraujo / videosearch

Large-scale video retrieval using image queries.
290 stars 104 forks source link

Result of SIFT is different from the VLFeat' SIFT in Matlab #3

Closed willard-yuan closed 9 years ago

willard-yuan commented 9 years ago

Hi andrefaraujo,

I found the SIFT result of your sift_extractor.cc is a little different from the vl_sift.c in VLfeat. The follows is from vl_sift.c

          /* Save back with MATLAB conventions. Notice tha the input
           * image was the transpose of the actual image. */
          frames [4 * nframes + 0] = k -> y + 1 ;
          frames [4 * nframes + 1] = k -> x + 1 ;
          frames [4 * nframes + 2] = k -> sigma ;
          frames [4 * nframes + 3] = VL_PI / 2 - angles [q] ;

          if (nout > 1) {
            if (! floatDescriptors) {
              for (j = 0 ; j < 128 ; ++j) {
                float x = 512.0F * rbuf [j] ;
                x = (x < 255.0F) ? x : 255.0F ;
                ((vl_uint8*)descr) [128 * nframes + j] = (vl_uint8) x ;
              }
            } else {
              for (j = 0 ; j < 128 ; ++j) {
                float x = 512.0F * rbuf [j] ;
                ((float*)descr) [128 * nframes + j] = x ;
              }
            }
          }

But your sift_extractor.cc are as follows:

                /* compute descriptor */
                vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ;

                this_frame [0] = k -> x ;
                this_frame [1] = k -> y ;
                this_frame [2] = k -> sigma ;
                this_frame [3] = angles [q];

                frames.push_back(this_frame);

                for (j = 0 ; j < 128 ; ++j) {
                    float x;
                    if (divide_512) {
                        x = rbuf [j] ;
                    } else {
                        x = 512.0F * rbuf [j] ;
                    }
                    this_descr [j] = x ;
                }
                descr.push_back(this_descr);

I think the result of your SIFT result will the same as the VLFeat' SIFT in Matlab if it is fixed by the vl_sift.c.

andrefaraujo commented 9 years ago

Hi willard-yuan,

Thanks for your comment. The VLFEAT code you mention is a mex function to be used with MATLAB. Thus, it returns MATLAB-like coordinates (x and y are added 1). Also, it processes a transposed version of the image, so it changes theta to pi/2 - angle (since the mex function is working with a transposed version).

Other than that, there might be slight differences due to opencv's loading of the grayscale image (which might be slightly different than MATLAB's). I am pretty sure that this will not give any noticeable difference in performance for the usage of the SIFT descriptor, but do let me know if you find results that provide evidence otherwise.

I'd be happy to help you further if you have problems with setting up the code, let me know.

Best,

Andre

willard-yuan commented 9 years ago

@andrefaraujo Thanks for your reply. I have successfully set it up earlier, and I have understood the reason of the frame difference. But I find the descriptors are different, for example: image The result is the same at frame 485th. But the SIFT keypoints number in Matlab is 486, however, the result is 485 in C++. Another problem is as follows: image The descriptor at frame 483 (int C++) is very different from the descriptor at frame 484 (in Matlab).

BTW, The result in Matlab is obtained by the command:

[f,d] = vl_sift(I, 'FloatDescriptors') ;
andrefaraujo commented 9 years ago

Happy to help.

Just a couple of questions to make sure I understand what you are saying:

It seems strange: if the frames are exactly the same, I see no reason why the descriptors would be very different (since VLFEAT is simply used to extract a descriptor from a given keypoint).

On Tue, Sep 8, 2015 at 10:28 PM, Yong Yuan notifications@github.com wrote:

@andrefaraujo https://github.com/andrefaraujo Thanks for your reply. I have successfully set it up earlier, and I have understood the reason of the frame difference. But I find the descriptors are different, for example: [image: image] https://cloud.githubusercontent.com/assets/5379711/9751088/ea3fb128-56d2-11e5-915d-677b5544ced1.png The result is the same at frame 485th. But the SIFT keypoints number in Matlab is 486, however, the result is 485 in C++. Another problem is as follows: [image: image] https://cloud.githubusercontent.com/assets/5379711/9751208/83f7bd50-56d4-11e5-90f3-719aeda6bbd0.png The descriptor at frame 483 (int C++) is very different from the descriptor at frame 484 (in Matlab).

BTW, The result in Matlab is obtained by the command:

[f,d] = vl_sift(I, 'FloatDescriptors') ;

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138747313 .

willard-yuan commented 9 years ago

@andrefaraujo I find the number of descriptors in C++ is less than one in Matlab. That is, the number of descriptors in Matlab = the number of descriptors in C plus plus + 1. I have tested on different images.

I'm sure that when I check the descriptor match, I do make sure the frame number is matched. That is, frame i in matlab should be matched in (i-1) in C++. My test codes are as follows:

// OpenCV can be used to read images.
#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <string>
#include <iostream>

// The VLFeat header files need to be declared external.
extern "C"{
#include <vl/generic.h>
#include <vl/stringop.h>
#include <vl/pgm.h>
#include <vl/sift.h>
#include <vl/getopt_long.h>
#include <vl/covdet.h>
};

using namespace std;
using namespace cv;

int main()
{
    //VL_PRINT ("Hello world!\n") ;
    string ImagePath = "C:\\Users\\Administrator\\Desktop\\img1.jpg";
    Mat image = imread(ImagePath, CV_LOAD_IMAGE_GRAYSCALE);   // Read the file
    int im_width = image.cols;
    int im_height = image.rows;

    // Transferring image to vlfeat structure
    unsigned int number_pixels = im_width*im_height;
    vl_sift_pix* data = new vl_sift_pix[number_pixels*sizeof(vl_sift_pix)];
    for (unsigned int ind = 0; ind < number_pixels; ind++) {
        data[ind] = static_cast<vl_sift_pix>(image.data[ind]);
    }

    int verbose = 1;
    bool divide_512 = 0;
    vector<float*> frames;
    vector<float*> descr;

    // VLSIFT parameters
    int                O     = - 1 ;
    int                S     =   3 ;
    int                o_min =   0 ;
    double             edge_thresh = -1;
    double             peak_thresh = -1 ;
    double             norm_thresh = -1 ;
    double             magnif      = -1 ;
    double             window_size = -1 ;

    bool            force_orientations = false ;

    VlSiftFilt* filt = vl_sift_new(im_width, im_height, O, S, o_min);

    int                nframes = 0, i,j,q ;

    if (peak_thresh >= 0) vl_sift_set_peak_thresh (filt, peak_thresh) ;
    if (edge_thresh >= 0) vl_sift_set_edge_thresh (filt, edge_thresh) ;
    if (norm_thresh >= 0) vl_sift_set_norm_thresh (filt, norm_thresh) ;
    if (magnif      >= 0) vl_sift_set_magnif      (filt, magnif) ;
    if (window_size >= 0) vl_sift_set_window_size (filt, window_size) ;

    if (verbose) {
      printf("vl_sift: filter settings:\n") ;
      printf("vl_sift:   image width           = %d\n",
                im_width) ;
      printf("vl_sift:   image height          = %d\n",
                im_height) ;
      printf("vl_sift:   octaves      (O)      = %d\n",
                vl_sift_get_noctaves      (filt)) ;
      printf("vl_sift:   levels       (S)      = %d\n",
                vl_sift_get_nlevels       (filt)) ;
      printf("vl_sift:   first octave (o_min)  = %d\n",
                vl_sift_get_octave_first  (filt)) ;
      printf("vl_sift:   edge thresh           = %g\n",
                vl_sift_get_edge_thresh   (filt)) ;
      printf("vl_sift:   peak thresh           = %g\n",
                vl_sift_get_peak_thresh   (filt)) ;
      printf("vl_sift:   norm thresh           = %g\n",
                vl_sift_get_norm_thresh   (filt)) ;
      printf("vl_sift:   window size           = %g\n",
                vl_sift_get_window_size   (filt)) ;

      printf("vl_sift: will force orientations? %s\n",
                force_orientations ? "yes" : "no") ;
    }

    /* ...............................................................
     *                                             Process each octave
     * ............................................................ */

    i     = 0 ;
    bool first = true;
    while (true) {
        int                   err ;
        VlSiftKeypoint const *keys  = 0 ;
        int                   nkeys = 0 ;

        if (verbose) {
            printf ("vl_sift: processing octave %d\n",
                       vl_sift_get_octave_index (filt)) ;
        }

        /* Calculate the GSS for the next octave .................... */
        if (first) {
            err   = vl_sift_process_first_octave (filt, data) ;
            first = false;
        } else {
            err   = vl_sift_process_next_octave  (filt) ;
        }

        if (err) break ;

        if (verbose > 1) {
            printf("vl_sift: GSS octave %d computed\n",
                      vl_sift_get_octave_index (filt));
        }

        /* Run detector ............................................. */

        vl_sift_detect (filt) ;

        keys  = vl_sift_get_keypoints  (filt) ;
        nkeys = vl_sift_get_nkeypoints (filt) ;
        i     = 0 ;

        if (verbose > 1) {
          printf ("vl_sift: detected %d (unoriented) keypoints\n", nkeys) ;
        }

        /* For each keypoint ........................................ */
        for (; i < nkeys ; ++i) {
            double                angles [4] ;
            int                   nangles ;
            VlSiftKeypoint const *k ;

            /* Obtain keypoint orientations ........................... */
            k = keys + i ;
            nangles = vl_sift_calc_keypoint_orientations(filt, angles, k) ;

            /* For each orientation ................................... */
            for (q = 0 ; q < nangles ; ++q) {
                vl_sift_pix rbuf [128] ;
                float* this_frame = new float[4*sizeof(float)];
                float* this_descr = new float[128*sizeof(float)];

                /* compute descriptor */
                vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ;

                this_frame [0] = k -> x ;
                this_frame [1] = k -> y ;
                this_frame [2] = k -> sigma ;
                this_frame [3] = angles [q];

                frames.push_back(this_frame);

                for (j = 0 ; j < 128 ; ++j) {
                    float x;
                    if (divide_512) {
                        x = rbuf [j] ;
                    } else {
                        x = 512.0F * rbuf [j] ;
                    }
                    this_descr [j] = x ;
                }
                descr.push_back(this_descr);
                ++ nframes ;
            } /* next orientation */
        } /* next keypoint */
    } /* next octave */

    int number_desc = nframes;
    cout << "sift detect points numbers: " << number_desc <<  endl;
    int tframeNum = 483;
    cout << "frame at "  << tframeNum << endl;
    for (int i = 0; i <  4; ++i)
        cout << frames[tframeNum][i] << "\t";
    cout << endl;
    cout << "descr at " << tframeNum << endl;
    for (int i = 0; i <  128; ++i)
        cout << descr[tframeNum][i] << "\t";
    cout << endl;

    // Clean up
    /* release filter */
    if (filt) {
        vl_sift_delete(filt);
        filt = 0;
    }
    /* release image data */
    if (data) {
      delete[] data;
      data = 0 ;
    }
    system("pause");
    return 0;
}

The above script is my test code in C++, and Matlab code is as follows:

I = imread('img1.jpg');
I = single(rgb2gray(I)) ;
[f,d] = vl_sift(I, 'FloatDescriptors') ;

I hope I can find the reason why they are different with your help.

Best.

andrefaraujo commented 9 years ago

I think the problem is that since they do not have the same number of descriptors, you might be looking at two descriptors that are actually not the same? For example, if you look at their (x,y) values, are they the same?

It could be that, for example, due to differences in the keypoint detectors (maybe different edge and peak thresholds, and slight differences in grayscale conversions), the C++ program detected some keypoints that were not detected by the MATLAB program and vice-versa. Is that the case?

On Tue, Sep 8, 2015 at 11:09 PM, Yong Yuan notifications@github.com wrote:

I find the number of descriptors in C++ is less than one in Matlab. That is, the number of descriptors in Matlab = the number of descriptors in C plus plus + 1. I have tested on different images.

I'm sure that when I check the descriptor match, I do make sure the frame number is matched. That is, frame i in matlab should be matched in (i-1) in C++. My test codes are as follows:

// OpenCV can be used to read images.

include <opencv2/opencv.hpp>

include <opencv2/core/core.hpp>

include <opencv2/highgui/highgui.hpp>

include

include

// The VLFeat header files need to be declared external.extern "C"{

include <vl/generic.h>

include <vl/stringop.h>

include <vl/pgm.h>

include <vl/sift.h>

include <vl/getopt_long.h>

include <vl/covdet.h>

}; using namespace std;using namespace cv; int main() { //VL_PRINT ("Hello world!\n") ; string ImagePath = "C:\Users\Administrator\Desktop\img1.jpg"; Mat image = imread(ImagePath, CV_LOAD_IMAGE_GRAYSCALE); // Read the file int im_width = image.cols; int im_height = image.rows;

// Transferring image to vlfeat structure
unsigned int number_pixels = im_width*im_height;
vl_sift_pix* data = new vl_sift_pix[number_pixels*sizeof(vl_sift_pix)];
for (unsigned int ind = 0; ind < number_pixels; ind++) {
    data[ind] = static_cast<vl_sift_pix>(image.data[ind]);
}

int verbose = 1;
bool divide_512 = 0;
vector<float*> frames;
vector<float*> descr;

// VLSIFT parameters
int                O     = - 1 ;
int                S     =   3 ;
int                o_min =   0 ;
double             edge_thresh = -1;
double             peak_thresh = -1 ;
double             norm_thresh = -1 ;
double             magnif      = -1 ;
double             window_size = -1 ;

bool            force_orientations = false ;

VlSiftFilt* filt = vl_sift_new(im_width, im_height, O, S, o_min);

int                nframes = 0, i,j,q ;

if (peak_thresh >= 0) vl_sift_set_peak_thresh (filt, peak_thresh) ;
if (edge_thresh >= 0) vl_sift_set_edge_thresh (filt, edge_thresh) ;
if (norm_thresh >= 0) vl_sift_set_norm_thresh (filt, norm_thresh) ;
if (magnif      >= 0) vl_sift_set_magnif      (filt, magnif) ;
if (window_size >= 0) vl_sift_set_window_size (filt, window_size) ;

if (verbose) {
  printf("vl_sift: filter settings:\n") ;
  printf("vl_sift:   image width           = %d\n",
            im_width) ;
  printf("vl_sift:   image height          = %d\n",
            im_height) ;
  printf("vl_sift:   octaves      (O)      = %d\n",
            vl_sift_get_noctaves      (filt)) ;
  printf("vl_sift:   levels       (S)      = %d\n",
            vl_sift_get_nlevels       (filt)) ;
  printf("vl_sift:   first octave (o_min)  = %d\n",
            vl_sift_get_octave_first  (filt)) ;
  printf("vl_sift:   edge thresh           = %g\n",
            vl_sift_get_edge_thresh   (filt)) ;
  printf("vl_sift:   peak thresh           = %g\n",
            vl_sift_get_peak_thresh   (filt)) ;
  printf("vl_sift:   norm thresh           = %g\n",
            vl_sift_get_norm_thresh   (filt)) ;
  printf("vl_sift:   window size           = %g\n",
            vl_sift_get_window_size   (filt)) ;

  printf("vl_sift: will force orientations? %s\n",
            force_orientations ? "yes" : "no") ;
}

/* ...............................................................     *                                             Process each octave     * ............................................................ */

i     = 0 ;
bool first = true;
while (true) {
    int                   err ;
    VlSiftKeypoint const *keys  = 0 ;
    int                   nkeys = 0 ;

    if (verbose) {
        printf ("vl_sift: processing octave %d\n",
                   vl_sift_get_octave_index (filt)) ;
    }

    /* Calculate the GSS for the next octave .................... */
    if (first) {
        err   = vl_sift_process_first_octave (filt, data) ;
        first = false;
    } else {
        err   = vl_sift_process_next_octave  (filt) ;
    }

    if (err) break ;

    if (verbose > 1) {
        printf("vl_sift: GSS octave %d computed\n",
                  vl_sift_get_octave_index (filt));
    }

    /* Run detector ............................................. */

    vl_sift_detect (filt) ;

    keys  = vl_sift_get_keypoints  (filt) ;
    nkeys = vl_sift_get_nkeypoints (filt) ;
    i     = 0 ;

    if (verbose > 1) {
      printf ("vl_sift: detected %d (unoriented) keypoints\n", nkeys) ;
    }

    /* For each keypoint ........................................ */
    for (; i < nkeys ; ++i) {
        double                angles [4] ;
        int                   nangles ;
        VlSiftKeypoint const *k ;

        /* Obtain keypoint orientations ........................... */
        k = keys + i ;
        nangles = vl_sift_calc_keypoint_orientations(filt, angles, k) ;

        /* For each orientation ................................... */
        for (q = 0 ; q < nangles ; ++q) {
            vl_sift_pix rbuf [128] ;
            float* this_frame = new float[4*sizeof(float)];
            float* this_descr = new float[128*sizeof(float)];

            /* compute descriptor */
            vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ;

            this_frame [0] = k -> x ;
            this_frame [1] = k -> y ;
            this_frame [2] = k -> sigma ;
            this_frame [3] = angles [q];

            frames.push_back(this_frame);

            for (j = 0 ; j < 128 ; ++j) {
                float x;
                if (divide_512) {
                    x = rbuf [j] ;
                } else {
                    x = 512.0F * rbuf [j] ;
                }
                this_descr [j] = x ;
            }
            descr.push_back(this_descr);
            ++ nframes ;
        } /* next orientation */
    } /* next keypoint */
} /* next octave */

int number_desc = nframes;
cout << "sift detect points numbers: " << number_desc <<  endl;
int tframeNum = 483;
cout << "frame at "  << tframeNum << endl;
for (int i = 0; i <  4; ++i)
    cout << frames[tframeNum][i] << "\t";
cout << endl;
cout << "descr at " << tframeNum << endl;
for (int i = 0; i <  128; ++i)
    cout << descr[tframeNum][i] << "\t";
cout << endl;

// Clean up
/* release filter */
if (filt) {
    vl_sift_delete(filt);
    filt = 0;
}
/* release image data */
if (data) {
  delete[] data;
  data = 0 ;
}
system("pause");
return 0;

}

The above script is my test code in C++, and Matlab code is as follows:

I = imread('img1.jpg'); I = single(rgb2gray(I)) ; [f,d] = vl_sift(I, 'FloatDescriptors') ;

I hope I can find the reason why they are different with your help.

Best.

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138754940 .

willard-yuan commented 9 years ago

It's very strange that order number is not satisfied the relationship of frame i in matlab should be matched in (i-1) in C+. The frame at 483 in c++ is the same at frame 486 in matlab, see the following picture:

image

image

All the parameters for SIFT dectector are set the same in c++ and matlab. It's really hard to understand it.

andrefaraujo commented 9 years ago

So then the descriptor of 483 in c++ is the same as the descriptor of 486 in Matlab?

Ordering is not a problem, as long as descriptors are the same -- they should not necessarily follow a special ordering.

On Wednesday, September 9, 2015, Yong Yuan notifications@github.com wrote:

It's very strange that order number is not satisfied the relationship of frame i in matlab should be matched in (i-1) in C+. The frame at 483 in c++ is the same at frame 486 in matlab, see the following picture:

[image: image] https://cloud.githubusercontent.com/assets/5379711/9752259/96167b28-56e0-11e5-91e2-d2243739663e.png

[image: image] https://cloud.githubusercontent.com/assets/5379711/9752386/0e878164-56e2-11e5-8873-fdfeb4bf8df1.png

All the parameters for SIFT dectector are set the same in c++ and matlab. It's really hard to understand it.

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138763260 .

willard-yuan commented 9 years ago

Yes, it's true. The only problem is the SIFT number is not equal to the number in matlab. I'm still reading the code to try to find the reason.

andrefaraujo commented 9 years ago

As I mentioned before: I had found small differences in the RGB --> grayscale conversion between MATLAB and OpenCV. This seems to me to be the reason of the difference, but I believe these should not matter much.

On Wed, Sep 9, 2015 at 3:30 AM, Yong Yuan notifications@github.com wrote:

Yes, it's true. The only problem is the SIFT number is not equal to the number in matlab. I'm still reading the code to try to find the reason.

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138802456 .