bikz05 / bag-of-words

Python Implementation of Bag of Words for Image Recognition using OpenCV and sklearn
218 stars 105 forks source link

Something about the idf #2

Closed willard-yuan closed 9 years ago

willard-yuan commented 9 years ago

Hi, bikz05 It seems you don't make use of the computed idf

# Perform Tf-Idf vectorization
nbr_occurences = np.sum( (im_features > 0) * 1, axis = 0)
idf = np.array(np.log((1.0*len(image_paths)+1) / (1.0*nbr_occurences + 1)), 'float32')

# Scaling the words
stdSlr = StandardScaler().fit(im_features)
im_features = stdSlr.transform(im_features)

I think for each im_features[i] it should multiply the idf to make full use of it, that is:

im_features = im_features*idf

Don't you think so?

bikz05 commented 9 years ago

Hi willard-yuan,

I had initially planned to use idf to to remove the stop words, but strangely the results were better without using idf. So, I decided not to use idf. I kept the code to calculate idf, if I decided o use it later on.

Thank You