JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

'Word2Vec' object has no attribute 'scan_vocab' #32

Closed didopop3 closed 6 years ago

didopop3 commented 6 years ago
html = st.produce_projection_explorer(corpus,
                                      word2vec_model=Word2Vec(size=100, window=5, min_count=10, workers=4),
                                      projection_model=umap.UMAP(min_dist=0.5, metric='cosine'),
                                      category='priority',
                                      category_name='urgent',
                                      not_category_name='normal',
                                      metadata=convention_df.subject
                                     )                                                                            

got an error: 'Word2Vec' object has no attribute 'scan_vocab'

checked the gensim.models.word2vec, there is no 'scan_vocab' attribute but there is a 'scan_vocab' under the class Word2VecVocab

python error pasted here

AttributeError                            Traceback (most recent call last)
<ipython-input-14-944faad0524e> in <module>()
      5                                       category_name='urgent',
      6                                       not_category_name='normal',
----> 7                                       metadata=convention_df.subject
      8                                      )                                                                            

/Users/kaibochen/anaconda/lib/python3.6/site-packages/scattertext/__init__.py in produce_projection_explorer(corpus, category, word2vec_model, projection_model, embeddings, term_acceptance_re, **kwargs)
   1111                 acceptable_terms = set([t for t in corpus.get_terms() if term_acceptance_re.match(t)])
   1112                 corpus = corpus.remove_terms(set(corpus.get_terms()) - acceptable_terms)
-> 1113                 model = Word2VecFromParsedCorpus(corpus, word2vec_model).train()
   1114                 weights = [model[word] for word in model.wv.vocab]
   1115                 weights = np.stack(weights)

/Users/kaibochen/anaconda/lib/python3.6/site-packages/scattertext/representations/Word2VecFromParsedCorpus.py in train(self, epochs, training_iterations)
    126         '''
    127 
--> 128                 self._scan_and_build_vocab()
    129                 for _ in range(training_iterations):
    130             self.model.train(CorpusAdapterForGensim.get_sentences(self.corpus),

/Users/kaibochen/anaconda/lib/python3.6/site-packages/scattertext/representations/Word2VecFromParsedCorpus.py in _scan_and_build_vocab(self)
    170 
    171         def _scan_and_build_vocab(self):
--> 172                 self.model.scan_vocab(CorpusAdapterForGensim.get_sentences(self.corpus))
    173                 self.model.build_vocab(CorpusAdapterForGensim.get_sentences(self.corpus))
    174 

AttributeError: 'Word2Vec' object has no attribute 'scan_vocab'
didopop3 commented 6 years ago

I reinstalled gensim version 3.2.0 and it works now. But i will leave it open for now

JasonKessler commented 6 years ago

Thanks for the bug report.

I tried to replicate this with gensim 3.2.0 and wasn't able to. Maybe your old Gensim installation was corrupted?

didopop3 commented 6 years ago

gensim 3.2.0 works fine for me, it's the new gensim 3.4.0 has the problem.