Hi andrefaraujo,

I found the SIFT result of your sift_extractor.cc is a little different from the vl_sift.c in VLfeat. The follows is from vl_sift.c

          /* Save back with MATLAB conventions. Notice tha the input
           * image was the transpose of the actual image. */
          frames [4 * nframes + 0] = k -> y + 1 ;
          frames [4 * nframes + 1] = k -> x + 1 ;
          frames [4 * nframes + 2] = k -> sigma ;
          frames [4 * nframes + 3] = VL_PI / 2 - angles [q] ;

          if (nout > 1) {
            if (! floatDescriptors) {
              for (j = 0 ; j < 128 ; ++j) {
                float x = 512.0F * rbuf [j] ;
                x = (x < 255.0F) ? x : 255.0F ;
                ((vl_uint8*)descr) [128 * nframes + j] = (vl_uint8) x ;
              }
            } else {
              for (j = 0 ; j < 128 ; ++j) {
                float x = 512.0F * rbuf [j] ;
                ((float*)descr) [128 * nframes + j] = x ;
              }
            }
          }

But your sift_extractor.cc are as follows:

                /* compute descriptor */
                vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ;

                this_frame [0] = k -> x ;
                this_frame [1] = k -> y ;
                this_frame [2] = k -> sigma ;
                this_frame [3] = angles [q];

                frames.push_back(this_frame);

                for (j = 0 ; j < 128 ; ++j) {
                    float x;
                    if (divide_512) {
                        x = rbuf [j] ;
                    } else {
                        x = 512.0F * rbuf [j] ;
                    }
                    this_descr [j] = x ;
                }
                descr.push_back(this_descr);

I think the result of your SIFT result will the same as the VLFeat' SIFT in Matlab if it is fixed by the vl_sift.c.

Hi willard-yuan,

Thanks for your comment. The VLFEAT code you mention is a mex function to be used with MATLAB. Thus, it returns MATLAB-like coordinates (x and y are added 1). Also, it processes a transposed version of the image, so it changes theta to pi/2 - angle (since the mex function is working with a transposed version).

Other than that, there might be slight differences due to opencv's loading of the grayscale image (which might be slightly different than MATLAB's). I am pretty sure that this will not give any noticeable difference in performance for the usage of the SIFT descriptor, but do let me know if you find results that provide evidence otherwise.

I'd be happy to help you further if you have problems with setting up the code, let me know.

Best,

Andre

@andrefaraujo Thanks for your reply. I have successfully set it up earlier, and I have understood the reason of the frame difference. But I find the descriptors are different, for example: The result is the same at frame 485th. But the SIFT keypoints number in Matlab is 486, however, the result is 485 in C++. Another problem is as follows: The descriptor at frame 483 (int C++) is very different from the descriptor at frame 484 (in Matlab).

BTW, The result in Matlab is obtained by the command:

[f,d] = vl_sift(I, 'FloatDescriptors') ;

Happy to help.

Just a couple of questions to make sure I understand what you are saying:

In both cases (c++ and matlab), do you find the same number of descriptors?
Does the frame (ie: x, y, scale, orientation) of descriptor 483 (c++) match the frame of descriptor 484 (in matlab)?

It seems strange: if the frames are exactly the same, I see no reason why the descriptors would be very different (since VLFEAT is simply used to extract a descriptor from a given keypoint).

On Tue, Sep 8, 2015 at 10:28 PM, Yong Yuan notifications@github.com wrote:

@andrefaraujo https://github.com/andrefaraujo Thanks for your reply. I have successfully set it up earlier, and I have understood the reason of the frame difference. But I find the descriptors are different, for example: [image: image] https://cloud.githubusercontent.com/assets/5379711/9751088/ea3fb128-56d2-11e5-915d-677b5544ced1.png The result is the same at frame 485th. But the SIFT keypoints number in Matlab is 486, however, the result is 485 in C++. Another problem is as follows: [image: image] https://cloud.githubusercontent.com/assets/5379711/9751208/83f7bd50-56d4-11e5-90f3-719aeda6bbd0.png The descriptor at frame 483 (int C++) is very different from the descriptor at frame 484 (in Matlab).

BTW, The result in Matlab is obtained by the command:

[f,d] = vl_sift(I, 'FloatDescriptors') ;

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138747313 .

@andrefaraujo I find the number of descriptors in C++ is less than one in Matlab. That is, the number of descriptors in Matlab = the number of descriptors in C plus plus + 1. I have tested on different images.

I'm sure that when I check the descriptor match, I do make sure the frame number is matched. That is, frame i in matlab should be matched in (i-1) in C++. My test codes are as follows:

// OpenCV can be used to read images.
#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <string>
#include <iostream>

// The VLFeat header files need to be declared external.
extern "C"{
#include <vl/generic.h>
#include <vl/stringop.h>
#include <vl/pgm.h>
#include <vl/sift.h>
#include <vl/getopt_long.h>
#include <vl/covdet.h>
};

using namespace std;
using namespace cv;

int main()
{
    //VL_PRINT ("Hello world!\n") ;
    string ImagePath = "C:\\Users\\Administrator\\Desktop\\img1.jpg";
    Mat image = imread(ImagePath, CV_LOAD_IMAGE_GRAYSCALE);   // Read the file
    int im_width = image.cols;
    int im_height = image.rows;

    // Transferring image to vlfeat structure
    unsigned int number_pixels = im_width*im_height;
    vl_sift_pix* data = new vl_sift_pix[number_pixels*sizeof(vl_sift_pix)];
    for (unsigned int ind = 0; ind < number_pixels; ind++) {
        data[ind] = static_cast<vl_sift_pix>(image.data[ind]);
    }

    int verbose = 1;
    bool divide_512 = 0;
    vector<float*> frames;
    vector<float*> descr;

    // VLSIFT parameters
    int                O     = - 1 ;
    int                S     =   3 ;
    int                o_min =   0 ;
    double             edge_thresh = -1;
    double             peak_thresh = -1 ;
    double             norm_thresh = -1 ;
    double             magnif      = -1 ;
    double             window_size = -1 ;

    bool            force_orientations = false ;

    VlSiftFilt* filt = vl_sift_new(im_width, im_height, O, S, o_min);

    int                nframes = 0, i,j,q ;

    if (peak_thresh >= 0) vl_sift_set_peak_thresh (filt, peak_thresh) ;
    if (edge_thresh >= 0) vl_sift_set_edge_thresh (filt, edge_thresh) ;
    if (norm_thresh >= 0) vl_sift_set_norm_thresh (filt, norm_thresh) ;
    if (magnif      >= 0) vl_sift_set_magnif      (filt, magnif) ;
    if (window_size >= 0) vl_sift_set_window_size (filt, window_size) ;

    if (verbose) {
      printf("vl_sift: filter settings:\n") ;
      printf("vl_sift:   image width           = %d\n",
                im_width) ;
      printf("vl_sift:   image height          = %d\n",
                im_height) ;
      printf("vl_sift:   octaves      (O)      = %d\n",
                vl_sift_get_noctaves      (filt)) ;
      printf("vl_sift:   levels       (S)      = %d\n",
                vl_sift_get_nlevels       (filt)) ;
      printf("vl_sift:   first octave (o_min)  = %d\n",
                vl_sift_get_octave_first  (filt)) ;
      printf("vl_sift:   edge thresh           = %g\n",
                vl_sift_get_edge_thresh   (filt)) ;
      printf("vl_sift:   peak thresh           = %g\n",
                vl_sift_get_peak_thresh   (filt)) ;
      printf("vl_sift:   norm thresh           = %g\n",
                vl_sift_get_norm_thresh   (filt)) ;
      printf("vl_sift:   window size           = %g\n",
                vl_sift_get_window_size   (filt)) ;

      printf("vl_sift: will force orientations? %s\n",
                force_orientations ? "yes" : "no") ;
    }

    /* ...............................................................
     *                                             Process each octave
     * ............................................................ */

    i     = 0 ;
    bool first = true;
    while (true) {
        int                   err ;
        VlSiftKeypoint const *keys  = 0 ;
        int                   nkeys = 0 ;

        if (verbose) {
            printf ("vl_sift: processing octave %d\n",
                       vl_sift_get_octave_index (filt)) ;
        }

        /* Calculate the GSS for the next octave .................... */
        if (first) {
            err   = vl_sift_process_first_octave (filt, data) ;
            first = false;
        } else {
            err   = vl_sift_process_next_octave  (filt) ;
        }

        if (err) break ;

        if (verbose > 1) {
            printf("vl_sift: GSS octave %d computed\n",
                      vl_sift_get_octave_index (filt));
        }

        /* Run detector ............................................. */

        vl_sift_detect (filt) ;

        keys  = vl_sift_get_keypoints  (filt) ;
        nkeys = vl_sift_get_nkeypoints (filt) ;
        i     = 0 ;

        if (verbose > 1) {
          printf ("vl_sift: detected %d (unoriented) keypoints\n", nkeys) ;
        }

        /* For each keypoint ........................................ */
        for (; i < nkeys ; ++i) {
            double                angles [4] ;
            int                   nangles ;
            VlSiftKeypoint const *k ;

            /* Obtain keypoint orientations ........................... */
            k = keys + i ;
            nangles = vl_sift_calc_keypoint_orientations(filt, angles, k) ;

            /* For each orientation ................................... */
            for (q = 0 ; q < nangles ; ++q) {
                vl_sift_pix rbuf [128] ;
                float* this_frame = new float[4*sizeof(float)];
                float* this_descr = new float[128*sizeof(float)];

                /* compute descriptor */
                vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ;

                this_frame [0] = k -> x ;
                this_frame [1] = k -> y ;
                this_frame [2] = k -> sigma ;
                this_frame [3] = angles [q];

                frames.push_back(this_frame);

                for (j = 0 ; j < 128 ; ++j) {
                    float x;
                    if (divide_512) {
                        x = rbuf [j] ;
                    } else {
                        x = 512.0F * rbuf [j] ;
                    }
                    this_descr [j] = x ;
                }
                descr.push_back(this_descr);
                ++ nframes ;
            } /* next orientation */
        } /* next keypoint */
    } /* next octave */

    int number_desc = nframes;
    cout << "sift detect points numbers: " << number_desc <<  endl;
    int tframeNum = 483;
    cout << "frame at "  << tframeNum << endl;
    for (int i = 0; i <  4; ++i)
        cout << frames[tframeNum][i] << "\t";
    cout << endl;
    cout << "descr at " << tframeNum << endl;
    for (int i = 0; i <  128; ++i)
        cout << descr[tframeNum][i] << "\t";
    cout << endl;

    // Clean up
    /* release filter */
    if (filt) {
        vl_sift_delete(filt);
        filt = 0;
    }
    /* release image data */
    if (data) {
      delete[] data;
      data = 0 ;
    }
    system("pause");
    return 0;
}

The above script is my test code in C++, and Matlab code is as follows:

I = imread('img1.jpg');
I = single(rgb2gray(I)) ;
[f,d] = vl_sift(I, 'FloatDescriptors') ;

I hope I can find the reason why they are different with your help.

Best.

I think the problem is that since they do not have the same number of descriptors, you might be looking at two descriptors that are actually not the same? For example, if you look at their (x,y) values, are they the same?

It could be that, for example, due to differences in the keypoint detectors (maybe different edge and peak thresholds, and slight differences in grayscale conversions), the C++ program detected some keypoints that were not detected by the MATLAB program and vice-versa. Is that the case?

On Tue, Sep 8, 2015 at 11:09 PM, Yong Yuan notifications@github.com wrote:

I find the number of descriptors in C++ is less than one in Matlab. That is, the number of descriptors in Matlab = the number of descriptors in C plus plus + 1. I have tested on different images.

I'm sure that when I check the descriptor match, I do make sure the frame number is matched. That is, frame i in matlab should be matched in (i-1) in C++. My test codes are as follows:

// OpenCV can be used to read images.

include <opencv2/opencv.hpp>

include <opencv2/core/core.hpp>

include <opencv2/highgui/highgui.hpp>

include

// The VLFeat header files need to be declared external.extern "C"{

include <vl/generic.h>

include <vl/stringop.h>

include <vl/pgm.h>

include <vl/sift.h>

include <vl/getopt_long.h>

include <vl/covdet.h>

}; using namespace std;using namespace cv; int main() { //VL_PRINT ("Hello world!\n") ; string ImagePath = "C:\Users\Administrator\Desktop\img1.jpg"; Mat image = imread(ImagePath, CV_LOAD_IMAGE_GRAYSCALE); // Read the file int im_width = image.cols; int im_height = image.rows;

// Transferring image to vlfeat structure
unsigned int number_pixels = im_width*im_height;
vl_sift_pix* data = new vl_sift_pix[number_pixels*sizeof(vl_sift_pix)];
for (unsigned int ind = 0; ind < number_pixels; ind++) {
    data[ind] = static_cast<vl_sift_pix>(image.data[ind]);
}

int verbose = 1;
bool divide_512 = 0;
vector<float*> frames;
vector<float*> descr;

// VLSIFT parameters
int                O     = - 1 ;
int                S     =   3 ;
int                o_min =   0 ;
double             edge_thresh = -1;
double             peak_thresh = -1 ;
double             norm_thresh = -1 ;
double             magnif      = -1 ;
double             window_size = -1 ;

bool            force_orientations = false ;

VlSiftFilt* filt = vl_sift_new(im_width, im_height, O, S, o_min);

int                nframes = 0, i,j,q ;

if (peak_thresh >= 0) vl_sift_set_peak_thresh (filt, peak_thresh) ;
if (edge_thresh >= 0) vl_sift_set_edge_thresh (filt, edge_thresh) ;
if (norm_thresh >= 0) vl_sift_set_norm_thresh (filt, norm_thresh) ;
if (magnif      >= 0) vl_sift_set_magnif      (filt, magnif) ;
if (window_size >= 0) vl_sift_set_window_size (filt, window_size) ;

if (verbose) {
  printf("vl_sift: filter settings:\n") ;
  printf("vl_sift:   image width           = %d\n",
            im_width) ;
  printf("vl_sift:   image height          = %d\n",
            im_height) ;
  printf("vl_sift:   octaves      (O)      = %d\n",
            vl_sift_get_noctaves      (filt)) ;
  printf("vl_sift:   levels       (S)      = %d\n",
            vl_sift_get_nlevels       (filt)) ;
  printf("vl_sift:   first octave (o_min)  = %d\n",
            vl_sift_get_octave_first  (filt)) ;
  printf("vl_sift:   edge thresh           = %g\n",
            vl_sift_get_edge_thresh   (filt)) ;
  printf("vl_sift:   peak thresh           = %g\n",
            vl_sift_get_peak_thresh   (filt)) ;
  printf("vl_sift:   norm thresh           = %g\n",
            vl_sift_get_norm_thresh   (filt)) ;
  printf("vl_sift:   window size           = %g\n",
            vl_sift_get_window_size   (filt)) ;

  printf("vl_sift: will force orientations? %s\n",
            force_orientations ? "yes" : "no") ;
}

/* ...............................................................     *                                             Process each octave     * ............................................................ */

i     = 0 ;
bool first = true;
while (true) {
    int                   err ;
    VlSiftKeypoint const *keys  = 0 ;
    int                   nkeys = 0 ;

    if (verbose) {
        printf ("vl_sift: processing octave %d\n",
                   vl_sift_get_octave_index (filt)) ;
    }

    /* Calculate the GSS for the next octave .................... */
    if (first) {
        err   = vl_sift_process_first_octave (filt, data) ;
        first = false;
    } else {
        err   = vl_sift_process_next_octave  (filt) ;
    }

    if (err) break ;

    if (verbose > 1) {
        printf("vl_sift: GSS octave %d computed\n",
                  vl_sift_get_octave_index (filt));
    }

    /* Run detector ............................................. */

    vl_sift_detect (filt) ;

    keys  = vl_sift_get_keypoints  (filt) ;
    nkeys = vl_sift_get_nkeypoints (filt) ;
    i     = 0 ;

    if (verbose > 1) {
      printf ("vl_sift: detected %d (unoriented) keypoints\n", nkeys) ;
    }

    /* For each keypoint ........................................ */
    for (; i < nkeys ; ++i) {
        double                angles [4] ;
        int                   nangles ;
        VlSiftKeypoint const *k ;

        /* Obtain keypoint orientations ........................... */
        k = keys + i ;
        nangles = vl_sift_calc_keypoint_orientations(filt, angles, k) ;

        /* For each orientation ................................... */
        for (q = 0 ; q < nangles ; ++q) {
            vl_sift_pix rbuf [128] ;
            float* this_frame = new float[4*sizeof(float)];
            float* this_descr = new float[128*sizeof(float)];

            /* compute descriptor */
            vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ;

            this_frame [0] = k -> x ;
            this_frame [1] = k -> y ;
            this_frame [2] = k -> sigma ;
            this_frame [3] = angles [q];

            frames.push_back(this_frame);

            for (j = 0 ; j < 128 ; ++j) {
                float x;
                if (divide_512) {
                    x = rbuf [j] ;
                } else {
                    x = 512.0F * rbuf [j] ;
                }
                this_descr [j] = x ;
            }
            descr.push_back(this_descr);
            ++ nframes ;
        } /* next orientation */
    } /* next keypoint */
} /* next octave */

int number_desc = nframes;
cout << "sift detect points numbers: " << number_desc <<  endl;
int tframeNum = 483;
cout << "frame at "  << tframeNum << endl;
for (int i = 0; i <  4; ++i)
    cout << frames[tframeNum][i] << "\t";
cout << endl;
cout << "descr at " << tframeNum << endl;
for (int i = 0; i <  128; ++i)
    cout << descr[tframeNum][i] << "\t";
cout << endl;

// Clean up
/* release filter */
if (filt) {
    vl_sift_delete(filt);
    filt = 0;
}
/* release image data */
if (data) {
  delete[] data;
  data = 0 ;
}
system("pause");
return 0;

}

The above script is my test code in C++, and Matlab code is as follows:

I = imread('img1.jpg'); I = single(rgb2gray(I)) ; [f,d] = vl_sift(I, 'FloatDescriptors') ;

I hope I can find the reason why they are different with your help.

Best.

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138754940 .

It's very strange that order number is not satisfied the relationship of frame i in matlab should be matched in (i-1) in C+. The frame at 483 in c++ is the same at frame 486 in matlab, see the following picture:

All the parameters for SIFT dectector are set the same in c++ and matlab. It's really hard to understand it.

So then the descriptor of 483 in c++ is the same as the descriptor of 486 in Matlab?

Ordering is not a problem, as long as descriptors are the same -- they should not necessarily follow a special ordering.

On Wednesday, September 9, 2015, Yong Yuan notifications@github.com wrote:

It's very strange that order number is not satisfied the relationship of frame i in matlab should be matched in (i-1) in C+. The frame at 483 in c++ is the same at frame 486 in matlab, see the following picture:

[image: image] https://cloud.githubusercontent.com/assets/5379711/9752259/96167b28-56e0-11e5-91e2-d2243739663e.png

[image: image] https://cloud.githubusercontent.com/assets/5379711/9752386/0e878164-56e2-11e5-8873-fdfeb4bf8df1.png

All the parameters for SIFT dectector are set the same in c++ and matlab. It's really hard to understand it.

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138763260 .

Yes, it's true. The only problem is the SIFT number is not equal to the number in matlab. I'm still reading the code to try to find the reason.

As I mentioned before: I had found small differences in the RGB --> grayscale conversion between MATLAB and OpenCV. This seems to me to be the reason of the difference, but I believe these should not matter much.

On Wed, Sep 9, 2015 at 3:30 AM, Yong Yuan notifications@github.com wrote:

Yes, it's true. The only problem is the SIFT number is not equal to the number in matlab. I'm still reading the code to try to find the reason.

— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138802456 .

andrefaraujo / videosearch

Result of SIFT is different from the VLFeat' SIFT in Matlab #3

include <opencv2/opencv.hpp>

include <opencv2/core/core.hpp>

include <opencv2/highgui/highgui.hpp>

include

include

include <vl/generic.h>

include <vl/stringop.h>

include <vl/pgm.h>

include <vl/sift.h>

include <vl/getopt_long.h>

include <vl/covdet.h>