Closed willard-yuan closed 9 years ago
Hi willard-yuan,
Thanks for your comment. The VLFEAT code you mention is a mex function to be used with MATLAB. Thus, it returns MATLAB-like coordinates (x and y are added 1). Also, it processes a transposed version of the image, so it changes theta to pi/2 - angle (since the mex function is working with a transposed version).
Other than that, there might be slight differences due to opencv's loading of the grayscale image (which might be slightly different than MATLAB's). I am pretty sure that this will not give any noticeable difference in performance for the usage of the SIFT descriptor, but do let me know if you find results that provide evidence otherwise.
I'd be happy to help you further if you have problems with setting up the code, let me know.
Best,
Andre
@andrefaraujo Thanks for your reply. I have successfully set it up earlier, and I have understood the reason of the frame difference. But I find the descriptors are different, for example: The result is the same at frame 485th. But the SIFT keypoints number in Matlab is 486, however, the result is 485 in C++. Another problem is as follows: The descriptor at frame 483 (int C++) is very different from the descriptor at frame 484 (in Matlab).
BTW, The result in Matlab is obtained by the command:
[f,d] = vl_sift(I, 'FloatDescriptors') ;
Happy to help.
Just a couple of questions to make sure I understand what you are saying:
It seems strange: if the frames are exactly the same, I see no reason why the descriptors would be very different (since VLFEAT is simply used to extract a descriptor from a given keypoint).
On Tue, Sep 8, 2015 at 10:28 PM, Yong Yuan notifications@github.com wrote:
@andrefaraujo https://github.com/andrefaraujo Thanks for your reply. I have successfully set it up earlier, and I have understood the reason of the frame difference. But I find the descriptors are different, for example: [image: image] https://cloud.githubusercontent.com/assets/5379711/9751088/ea3fb128-56d2-11e5-915d-677b5544ced1.png The result is the same at frame 485th. But the SIFT keypoints number in Matlab is 486, however, the result is 485 in C++. Another problem is as follows: [image: image] https://cloud.githubusercontent.com/assets/5379711/9751208/83f7bd50-56d4-11e5-90f3-719aeda6bbd0.png The descriptor at frame 483 (int C++) is very different from the descriptor at frame 484 (in Matlab).
BTW, The result in Matlab is obtained by the command:
[f,d] = vl_sift(I, 'FloatDescriptors') ;
— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138747313 .
@andrefaraujo I find the number of descriptors in C++ is less than one in Matlab. That is, the number of descriptors in Matlab = the number of descriptors in C plus plus + 1. I have tested on different images.
I'm sure that when I check the descriptor match, I do make sure the frame number is matched. That is, frame i in matlab should be matched in (i-1) in C++. My test codes are as follows:
// OpenCV can be used to read images.
#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <string>
#include <iostream>
// The VLFeat header files need to be declared external.
extern "C"{
#include <vl/generic.h>
#include <vl/stringop.h>
#include <vl/pgm.h>
#include <vl/sift.h>
#include <vl/getopt_long.h>
#include <vl/covdet.h>
};
using namespace std;
using namespace cv;
int main()
{
//VL_PRINT ("Hello world!\n") ;
string ImagePath = "C:\\Users\\Administrator\\Desktop\\img1.jpg";
Mat image = imread(ImagePath, CV_LOAD_IMAGE_GRAYSCALE); // Read the file
int im_width = image.cols;
int im_height = image.rows;
// Transferring image to vlfeat structure
unsigned int number_pixels = im_width*im_height;
vl_sift_pix* data = new vl_sift_pix[number_pixels*sizeof(vl_sift_pix)];
for (unsigned int ind = 0; ind < number_pixels; ind++) {
data[ind] = static_cast<vl_sift_pix>(image.data[ind]);
}
int verbose = 1;
bool divide_512 = 0;
vector<float*> frames;
vector<float*> descr;
// VLSIFT parameters
int O = - 1 ;
int S = 3 ;
int o_min = 0 ;
double edge_thresh = -1;
double peak_thresh = -1 ;
double norm_thresh = -1 ;
double magnif = -1 ;
double window_size = -1 ;
bool force_orientations = false ;
VlSiftFilt* filt = vl_sift_new(im_width, im_height, O, S, o_min);
int nframes = 0, i,j,q ;
if (peak_thresh >= 0) vl_sift_set_peak_thresh (filt, peak_thresh) ;
if (edge_thresh >= 0) vl_sift_set_edge_thresh (filt, edge_thresh) ;
if (norm_thresh >= 0) vl_sift_set_norm_thresh (filt, norm_thresh) ;
if (magnif >= 0) vl_sift_set_magnif (filt, magnif) ;
if (window_size >= 0) vl_sift_set_window_size (filt, window_size) ;
if (verbose) {
printf("vl_sift: filter settings:\n") ;
printf("vl_sift: image width = %d\n",
im_width) ;
printf("vl_sift: image height = %d\n",
im_height) ;
printf("vl_sift: octaves (O) = %d\n",
vl_sift_get_noctaves (filt)) ;
printf("vl_sift: levels (S) = %d\n",
vl_sift_get_nlevels (filt)) ;
printf("vl_sift: first octave (o_min) = %d\n",
vl_sift_get_octave_first (filt)) ;
printf("vl_sift: edge thresh = %g\n",
vl_sift_get_edge_thresh (filt)) ;
printf("vl_sift: peak thresh = %g\n",
vl_sift_get_peak_thresh (filt)) ;
printf("vl_sift: norm thresh = %g\n",
vl_sift_get_norm_thresh (filt)) ;
printf("vl_sift: window size = %g\n",
vl_sift_get_window_size (filt)) ;
printf("vl_sift: will force orientations? %s\n",
force_orientations ? "yes" : "no") ;
}
/* ...............................................................
* Process each octave
* ............................................................ */
i = 0 ;
bool first = true;
while (true) {
int err ;
VlSiftKeypoint const *keys = 0 ;
int nkeys = 0 ;
if (verbose) {
printf ("vl_sift: processing octave %d\n",
vl_sift_get_octave_index (filt)) ;
}
/* Calculate the GSS for the next octave .................... */
if (first) {
err = vl_sift_process_first_octave (filt, data) ;
first = false;
} else {
err = vl_sift_process_next_octave (filt) ;
}
if (err) break ;
if (verbose > 1) {
printf("vl_sift: GSS octave %d computed\n",
vl_sift_get_octave_index (filt));
}
/* Run detector ............................................. */
vl_sift_detect (filt) ;
keys = vl_sift_get_keypoints (filt) ;
nkeys = vl_sift_get_nkeypoints (filt) ;
i = 0 ;
if (verbose > 1) {
printf ("vl_sift: detected %d (unoriented) keypoints\n", nkeys) ;
}
/* For each keypoint ........................................ */
for (; i < nkeys ; ++i) {
double angles [4] ;
int nangles ;
VlSiftKeypoint const *k ;
/* Obtain keypoint orientations ........................... */
k = keys + i ;
nangles = vl_sift_calc_keypoint_orientations(filt, angles, k) ;
/* For each orientation ................................... */
for (q = 0 ; q < nangles ; ++q) {
vl_sift_pix rbuf [128] ;
float* this_frame = new float[4*sizeof(float)];
float* this_descr = new float[128*sizeof(float)];
/* compute descriptor */
vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ;
this_frame [0] = k -> x ;
this_frame [1] = k -> y ;
this_frame [2] = k -> sigma ;
this_frame [3] = angles [q];
frames.push_back(this_frame);
for (j = 0 ; j < 128 ; ++j) {
float x;
if (divide_512) {
x = rbuf [j] ;
} else {
x = 512.0F * rbuf [j] ;
}
this_descr [j] = x ;
}
descr.push_back(this_descr);
++ nframes ;
} /* next orientation */
} /* next keypoint */
} /* next octave */
int number_desc = nframes;
cout << "sift detect points numbers: " << number_desc << endl;
int tframeNum = 483;
cout << "frame at " << tframeNum << endl;
for (int i = 0; i < 4; ++i)
cout << frames[tframeNum][i] << "\t";
cout << endl;
cout << "descr at " << tframeNum << endl;
for (int i = 0; i < 128; ++i)
cout << descr[tframeNum][i] << "\t";
cout << endl;
// Clean up
/* release filter */
if (filt) {
vl_sift_delete(filt);
filt = 0;
}
/* release image data */
if (data) {
delete[] data;
data = 0 ;
}
system("pause");
return 0;
}
The above script is my test code in C++, and Matlab code is as follows:
I = imread('img1.jpg');
I = single(rgb2gray(I)) ;
[f,d] = vl_sift(I, 'FloatDescriptors') ;
I hope I can find the reason why they are different with your help.
Best.
I think the problem is that since they do not have the same number of descriptors, you might be looking at two descriptors that are actually not the same? For example, if you look at their (x,y) values, are they the same?
It could be that, for example, due to differences in the keypoint detectors (maybe different edge and peak thresholds, and slight differences in grayscale conversions), the C++ program detected some keypoints that were not detected by the MATLAB program and vice-versa. Is that the case?
On Tue, Sep 8, 2015 at 11:09 PM, Yong Yuan notifications@github.com wrote:
I find the number of descriptors in C++ is less than one in Matlab. That is, the number of descriptors in Matlab = the number of descriptors in C plus plus + 1. I have tested on different images.
I'm sure that when I check the descriptor match, I do make sure the frame number is matched. That is, frame i in matlab should be matched in (i-1) in C++. My test codes are as follows:
// OpenCV can be used to read images.
include <opencv2/opencv.hpp>
include <opencv2/core/core.hpp>
include <opencv2/highgui/highgui.hpp>
include
include
// The VLFeat header files need to be declared external.extern "C"{
include <vl/generic.h>
include <vl/stringop.h>
include <vl/pgm.h>
include <vl/sift.h>
include <vl/getopt_long.h>
include <vl/covdet.h>
}; using namespace std;using namespace cv; int main() { //VL_PRINT ("Hello world!\n") ; string ImagePath = "C:\Users\Administrator\Desktop\img1.jpg"; Mat image = imread(ImagePath, CV_LOAD_IMAGE_GRAYSCALE); // Read the file int im_width = image.cols; int im_height = image.rows;
// Transferring image to vlfeat structure unsigned int number_pixels = im_width*im_height; vl_sift_pix* data = new vl_sift_pix[number_pixels*sizeof(vl_sift_pix)]; for (unsigned int ind = 0; ind < number_pixels; ind++) { data[ind] = static_cast<vl_sift_pix>(image.data[ind]); } int verbose = 1; bool divide_512 = 0; vector<float*> frames; vector<float*> descr; // VLSIFT parameters int O = - 1 ; int S = 3 ; int o_min = 0 ; double edge_thresh = -1; double peak_thresh = -1 ; double norm_thresh = -1 ; double magnif = -1 ; double window_size = -1 ; bool force_orientations = false ; VlSiftFilt* filt = vl_sift_new(im_width, im_height, O, S, o_min); int nframes = 0, i,j,q ; if (peak_thresh >= 0) vl_sift_set_peak_thresh (filt, peak_thresh) ; if (edge_thresh >= 0) vl_sift_set_edge_thresh (filt, edge_thresh) ; if (norm_thresh >= 0) vl_sift_set_norm_thresh (filt, norm_thresh) ; if (magnif >= 0) vl_sift_set_magnif (filt, magnif) ; if (window_size >= 0) vl_sift_set_window_size (filt, window_size) ; if (verbose) { printf("vl_sift: filter settings:\n") ; printf("vl_sift: image width = %d\n", im_width) ; printf("vl_sift: image height = %d\n", im_height) ; printf("vl_sift: octaves (O) = %d\n", vl_sift_get_noctaves (filt)) ; printf("vl_sift: levels (S) = %d\n", vl_sift_get_nlevels (filt)) ; printf("vl_sift: first octave (o_min) = %d\n", vl_sift_get_octave_first (filt)) ; printf("vl_sift: edge thresh = %g\n", vl_sift_get_edge_thresh (filt)) ; printf("vl_sift: peak thresh = %g\n", vl_sift_get_peak_thresh (filt)) ; printf("vl_sift: norm thresh = %g\n", vl_sift_get_norm_thresh (filt)) ; printf("vl_sift: window size = %g\n", vl_sift_get_window_size (filt)) ; printf("vl_sift: will force orientations? %s\n", force_orientations ? "yes" : "no") ; } /* ............................................................... * Process each octave * ............................................................ */ i = 0 ; bool first = true; while (true) { int err ; VlSiftKeypoint const *keys = 0 ; int nkeys = 0 ; if (verbose) { printf ("vl_sift: processing octave %d\n", vl_sift_get_octave_index (filt)) ; } /* Calculate the GSS for the next octave .................... */ if (first) { err = vl_sift_process_first_octave (filt, data) ; first = false; } else { err = vl_sift_process_next_octave (filt) ; } if (err) break ; if (verbose > 1) { printf("vl_sift: GSS octave %d computed\n", vl_sift_get_octave_index (filt)); } /* Run detector ............................................. */ vl_sift_detect (filt) ; keys = vl_sift_get_keypoints (filt) ; nkeys = vl_sift_get_nkeypoints (filt) ; i = 0 ; if (verbose > 1) { printf ("vl_sift: detected %d (unoriented) keypoints\n", nkeys) ; } /* For each keypoint ........................................ */ for (; i < nkeys ; ++i) { double angles [4] ; int nangles ; VlSiftKeypoint const *k ; /* Obtain keypoint orientations ........................... */ k = keys + i ; nangles = vl_sift_calc_keypoint_orientations(filt, angles, k) ; /* For each orientation ................................... */ for (q = 0 ; q < nangles ; ++q) { vl_sift_pix rbuf [128] ; float* this_frame = new float[4*sizeof(float)]; float* this_descr = new float[128*sizeof(float)]; /* compute descriptor */ vl_sift_calc_keypoint_descriptor (filt, rbuf, k, angles [q]) ; this_frame [0] = k -> x ; this_frame [1] = k -> y ; this_frame [2] = k -> sigma ; this_frame [3] = angles [q]; frames.push_back(this_frame); for (j = 0 ; j < 128 ; ++j) { float x; if (divide_512) { x = rbuf [j] ; } else { x = 512.0F * rbuf [j] ; } this_descr [j] = x ; } descr.push_back(this_descr); ++ nframes ; } /* next orientation */ } /* next keypoint */ } /* next octave */ int number_desc = nframes; cout << "sift detect points numbers: " << number_desc << endl; int tframeNum = 483; cout << "frame at " << tframeNum << endl; for (int i = 0; i < 4; ++i) cout << frames[tframeNum][i] << "\t"; cout << endl; cout << "descr at " << tframeNum << endl; for (int i = 0; i < 128; ++i) cout << descr[tframeNum][i] << "\t"; cout << endl; // Clean up /* release filter */ if (filt) { vl_sift_delete(filt); filt = 0; } /* release image data */ if (data) { delete[] data; data = 0 ; } system("pause"); return 0;
}
The above script is my test code in C++, and Matlab code is as follows:
I = imread('img1.jpg'); I = single(rgb2gray(I)) ; [f,d] = vl_sift(I, 'FloatDescriptors') ;
I hope I can find the reason why they are different with your help.
Best.
— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138754940 .
It's very strange that order number is not satisfied the relationship of frame i in matlab should be matched in (i-1) in C+. The frame at 483 in c++ is the same at frame 486 in matlab, see the following picture:
All the parameters for SIFT dectector are set the same in c++ and matlab. It's really hard to understand it.
So then the descriptor of 483 in c++ is the same as the descriptor of 486 in Matlab?
Ordering is not a problem, as long as descriptors are the same -- they should not necessarily follow a special ordering.
On Wednesday, September 9, 2015, Yong Yuan notifications@github.com wrote:
It's very strange that order number is not satisfied the relationship of frame i in matlab should be matched in (i-1) in C+. The frame at 483 in c++ is the same at frame 486 in matlab, see the following picture:
[image: image] https://cloud.githubusercontent.com/assets/5379711/9752259/96167b28-56e0-11e5-91e2-d2243739663e.png
[image: image] https://cloud.githubusercontent.com/assets/5379711/9752386/0e878164-56e2-11e5-8873-fdfeb4bf8df1.png
All the parameters for SIFT dectector are set the same in c++ and matlab. It's really hard to understand it.
— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138763260 .
Yes, it's true. The only problem is the SIFT number is not equal to the number in matlab. I'm still reading the code to try to find the reason.
As I mentioned before: I had found small differences in the RGB --> grayscale conversion between MATLAB and OpenCV. This seems to me to be the reason of the difference, but I believe these should not matter much.
On Wed, Sep 9, 2015 at 3:30 AM, Yong Yuan notifications@github.com wrote:
Yes, it's true. The only problem is the SIFT number is not equal to the number in matlab. I'm still reading the code to try to find the reason.
— Reply to this email directly or view it on GitHub https://github.com/andrefaraujo/videosearch/issues/3#issuecomment-138802456 .
Hi andrefaraujo,
I found the SIFT result of your
sift_extractor.cc
is a little different from thevl_sift.c
in VLfeat. The follows is fromvl_sift.c
But your
sift_extractor.cc
are as follows:I think the result of your SIFT result will the same as the VLFeat' SIFT in Matlab if it is fixed by the
vl_sift.c
.