MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
718 stars 102 forks source link

TypeError: load_word2vec_format() got an unexpected keyword argument 'no_header' #44

Closed cayaluke closed 2 years ago

cayaluke commented 2 years ago

Description

Hi @lffloyd and @silviatti

I tried to run the ETM with pre-trained embeddings after the recent upgrade, and it returned this error.

TypeError: load_word2vec_format() got an unexpected keyword argument 'no_header'.

Please advise if I made an error on my end.

My commands and traceback are provided below.

Thank you so much! Luke

What I Did

model = ETM(num_topics=40, num_epochs=1, use_partitions=False, train_embeddings=False,
            embeddings_type='word2vec', embeddings_path=r'my/path/to/embedding/skipgram_emb_300d.txt', binary_embeddings=False, headerless_embeddings=True)

output= model.train_model(dataset)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
d:\01MRes_ubuntu\OCTIS\fomcNoPartitionsPreTrained\EtmRunModelPreTrained300.py in <module>
----> 26 output_fomc_etm = model.train_model(dataset)

~\anaconda3\envs\lda_env36\lib\site-packages\octis\models\ETM.py in train_model(self, dataset, hyperparameters, top_words)
     74         if hyperparameters is None:
     75             hyperparameters = {}
---> 76         self.set_model(dataset, hyperparameters)
     77         self.top_word = top_words
     78         self.early_stopping = EarlyStopping(patience=5, verbose=True)

~\anaconda3\envs\lda_env36\lib\site-packages\octis\models\ETM.py in set_model(self, dataset, hyperparameters)
    119 
    120         self.set_default_hyperparameters(hyperparameters)
--> 121         self.load_embeddings()
    122         ## define model and optimizer
    123         self.model = etm.ETM(num_topics=self.hyperparameters['num_topics'], vocab_size=len(self.vocab.keys()),

~\anaconda3\envs\lda_env36\lib\site-packages\octis\models\base_etm.py in load_embeddings(self)
     52                                         self.hyperparameters['embeddings_type'],
     53                                         self.hyperparameters['binary_embeddings'],
---> 54                                         self.hyperparameters['headerless_embeddings'])
     55         embeddings = np.zeros((len(self.vocab.keys()), self.hyperparameters['embedding_size']))
     56         for i, word in enumerate(self.vocab.values()):

~\anaconda3\envs\lda_env36\lib\site-packages\octis\models\base_etm.py in _load_word_vectors(self, embeddings_path, embeddings_type, binary_embeddings, headerless_embeddings)
     85                 embeddings_path,
     86                 binary=binary_embeddings,
---> 87                 no_header=headerless_embeddings)
     88 
     89         vectors = {}

TypeError: load_word2vec_format() got an unexpected keyword argument 'no_header'
lfmatosm commented 2 years ago

Hi @cayaluke!

First, how are you using octis? You've cloned the repository or installed the library? Can you confirm which version of gensim is in use by your environment?

This seems to be related to a mismatch in the gensim version. The package needs gensim>=4.0.0, which supports the no_header argument as stated here. However, previous versions might not support this. I gave a glance at gensim's 3.8.3 docs, and this isn't supported there.

cayaluke commented 2 years ago

Hello @lffloyd, thank you for your response.

  1. I can confirm that my version of gensim is 3.8.3.

I have since upgraded gensim to 4.1.2 and she worked perfectly!

@lffloyd and @silviatti thank you again for your hard work.

Best Luke