StatguyUser / TextFeatureSelection

Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
MIT License
50 stars 5 forks source link

'CountVectorizer' object has no attribute 'get_feature_names' #30

Open primadermawan opened 1 year ago

primadermawan commented 1 year ago

`from TextFeatureSelection import TextFeatureSelection

Binary classification

input_doc_list=new_df_4['txt'].values.tolist() target=new_df_4['target'].values.tolist() fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list) result_df=fsOBJ.getScore() print(result_df)`

That's my code and the error: `--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell 102 in 7 5 target=new_df_4['target'].values.tolist() 6 fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list) ----> 7 result_df=fsOBJ.getScore() 8 print(result_df)

File /opt/homebrew/lib/python3.10/site-packages/TextFeatureSelection.py:409, in TextFeatureSelection.getScore(self) 407 else: 408 if len(set(self.target))==2: --> 409 values_df=self._getvalues_singleclass() 410 return values_df 411 elif len(set(self.target))>2:

File /opt/homebrew/lib/python3.10/site-packages/TextFeatureSelection.py:268, in TextFeatureSelection._getvalues_singleclass(self) 265 label_array=self._get_binary_label(self.target) 267 #get word, count, binary matrix --> 268 word_list,count_list,word_binary_matrix=self._get_term_binary_matrix(self.input_doc_list) 270 #get ABCDN 271 A,B,C,D,N=self._get_ABCD(word_binary_matrix,label_array)

File /opt/homebrew/lib/python3.10/site-packages/TextFeatureSelection.py:231, in TextFeatureSelection._get_term_binary_matrix(self, input_doc_list) 229 vectorizer = CountVectorizer() 230 X = vectorizer.fit_transform(input_doc_list) --> 231 word_list = vectorizer.get_feature_names() 233 #binary word document matrix 234 vectorizer = CountVectorizer(binary=True)

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'`

StatguyUser commented 1 year ago

Please suggest the version on your computer for the scikit-learn library.

primadermawan commented 1 year ago

I got scikit-learn 1.2.2

some people suggest to change the get_feature_names into get_feature_names_out

StatguyUser commented 1 year ago

If you can deprecate scikit-learn and use older version it will work.

nauval123 commented 9 months ago

i try to calculate and use the library but the number of information gain value is so different, can you tell me how you implement the equation of information gain in your library?