Bisht9887 / Wordcloud-using-Python

0 stars 0 forks source link

tf_idf Vectorizer giving different shape of an element before and after saving it #1

Open Bisht9887 opened 5 years ago

Bisht9887 commented 5 years ago

I am using tf_idf vectorizer on dummy data as shown in the code. If I print the shape of an element after vectorization, I get the output (14,). When I save the vectorizer and later transform using that vectorizer then the shape is (1,14). May I know if anything is done wrong or is it a bug? Running the two below two lines gives different outputs. May I know why is it so?

print(file_data[0].shape)
output: (14,)

print(transformed_file.shape)
output: (1,14)
Bisht9887 commented 5 years ago
from sklearn.feature_extraction.text import TfidfVectorizer
import pickle

file_list = ["Google headquarters is in California", "Steve Jobs was a great  man",
    "Steve Jobs has done great technology innovations"]

transformer = TfidfVectorizer().fit(file_list)
file_data = transformer.transform(file_list)
with open('transformer.pickle', 'wb') as handle:
    pickle.dump(transformer, handle, protocol=pickle.HIGHEST_PROTOCOL)
file_data = file_data.toarray()
with open('transformer.pickle', 'rb') as handle:
    transformer = pickle.load(handle)
transformed_file = transformer.transform([file_list[0]]).toarray(

print(file_data[0].shape)
# output: (14,)

print(transformed_file.shape)
# output: (1,14)

# why different output?