MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
187 stars 16 forks source link

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names' #19

Open shelbywhite opened 1 year ago

shelbywhite commented 1 year ago

Trying to run this code on Google Colab and seeing this error now. Simply just trying to use the demo provided in this repo, but now it's throwing the following error:


AttributeError Traceback (most recent call last) in 3 # Fit the Concept model to the images and vocabulary 4 concept_model = ConceptModel() ----> 5 concepts = concept_model.fit_transform(img_names, docs=selected_nouns) 6 7 # Get the predicted probabilities for each concept cluster for each image

1 frames /usr/local/lib/python3.9/dist-packages/concept/_model.py in _extract_textual_representation(self, docs) 400 # Extract vocabulary from the documents 401 self.vectorizer_model.fit(docs) --> 402 words = self.vectorizer_model.get_feature_names() 403 404 # Embed the documents and extract similarity between concept clusters and words

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

MaartenGr commented 1 year ago

Ah, I believe that is an issue with the scikit-learn version. I believe that if you install a sklearn version pre 1.0, then it should work.

renswilderom commented 10 months ago

Hello Maarten, I had the same issue. Installing a sklearn version older than 1.0 will probably work indeed.

What I understand from this SO post, is that get_feature_names is depreciated and replaced by get_feature_names_out() from sklearn version 1.0 and higher.

MaartenGr commented 10 months ago

Also, I would advise using BERTopic instead as that has more options for multi-modal topic modeling.

renswilderom commented 10 months ago

OK - thanks for the tip. I was already using BERTopic for text, but didn't know it had this multimodal feature. Great!

BingBing20230401 commented 2 months ago

thanks.!! it solved my problem too!

--What I understand from this SO post, is that get_feature_names is depreciated and replaced by get_feature_names_out() from sklearn version 1.0 and higher.