I'm trying to test to optimizer described in your docs, and following the steps exactly (except for changing dataset.load to dataset.fetch_dataset) but I get the following error TypeError: doc2bow expects an array of unicode tokens on input, not a single string
What I Did
from skopt.space.space import Real
from octis.evaluation_metrics.coherence_metrics import Coherence
from octis.models.LDA import LDA
from octis.optimization.optimizer import Optimizer
optimizer = Optimizer()
model = LDA()
model.hyperparameters.update({"num_topics": 20})
dataset = Dataset()
dataset.fetch_dataset("M10")
metric_parameters = {
'texts': dataset.get_corpus(),
'topk': 10,
'measure': 'c_npmi'
}
npmi = Coherence(metric_parameters)
search_space = {
"alpha": Real(low=0.001, high=5.0),
"eta": Real(low=0.001, high=5.0)
}
optimization_result = optimizer.optimize(model,
dataset,
npmi,
search_space,
number_of_call=10,
n_random_starts=3,
model_runs=3,
save_name="result",
surrogate_model="RF",
acq_func="LCB"
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[11], [line 16](vscode-notebook-cell:?execution_count=11&line=16)
[9](vscode-notebook-cell:?execution_count=11&line=9) model.hyperparameters.update({"num_topics": 20})
[11](vscode-notebook-cell:?execution_count=11&line=11) metric_parameters = {
[12](vscode-notebook-cell:?execution_count=11&line=12) 'texts': dataset.get_corpus(),
[13](vscode-notebook-cell:?execution_count=11&line=13) 'topk': 10,
[14](vscode-notebook-cell:?execution_count=11&line=14) 'measure': 'c_npmi'
[15](vscode-notebook-cell:?execution_count=11&line=15) }
---> [16](vscode-notebook-cell:?execution_count=11&line=16) npmi = Coherence(metric_parameters)
[18](vscode-notebook-cell:?execution_count=11&line=18) search_space = {
[19](vscode-notebook-cell:?execution_count=11&line=19) "alpha": Real(low=0.001, high=5.0),
[20](vscode-notebook-cell:?execution_count=11&line=20) "eta": Real(low=0.001, high=5.0)
[21](vscode-notebook-cell:?execution_count=11&line=21) }
[23](vscode-notebook-cell:?execution_count=11&line=23) optimization_result = optimizer.optimize(model,
[24](vscode-notebook-cell:?execution_count=11&line=24) dataset,
[25](vscode-notebook-cell:?execution_count=11&line=25) npmi,
(...)
[32](vscode-notebook-cell:?execution_count=11&line=32) acq_func="LCB"
[33](vscode-notebook-cell:?execution_count=11&line=33) )
File [/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/octis/evaluation_metrics/coherence_metrics.py:34](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/octis/evaluation_metrics/coherence_metrics.py:34), in Coherence.__init__(self, texts, topk, processes, measure)
[32](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/octis/evaluation_metrics/coherence_metrics.py:32) else:
[33](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/octis/evaluation_metrics/coherence_metrics.py:33) self._texts = texts
---> [34](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/octis/evaluation_metrics/coherence_metrics.py:34) self._dictionary = Dictionary(self._texts)
[35](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/octis/evaluation_metrics/coherence_metrics.py:35) self.topk = topk
[36](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/octis/evaluation_metrics/coherence_metrics.py:36) self.processes = processes
File [/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:78](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:78), in Dictionary.__init__(self, documents, prune_at)
[75](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:75) self.num_nnz = 0
[77](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:77) if documents is not None:
---> [78](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:78) self.add_documents(documents, prune_at=prune_at)
[79](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:79) self.add_lifecycle_event(
[80](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:80) "created",
[81](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:81) msg=f"built {self} from {self.num_docs} documents (total {self.num_pos} corpus positions)",
[82](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:82) )
File [/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:204](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:204), in Dictionary.add_documents(self, documents, prune_at)
[201](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:201) logger.info("adding document #%i to %s", docno, self)
[203](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:203) # update Dictionary with the document
--> [204](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:204) self.doc2bow(document, allow_update=True) # ignore the result, here we only care about updating token ids
[206](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:206) logger.info("built %s from %i documents (total %i corpus positions)", self, self.num_docs, self.num_pos)
File [/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:241](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:241), in Dictionary.doc2bow(self, document, allow_update, return_missing)
[209](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:209) """Convert `document` into the bag-of-words (BoW) format = list of `(token_id, token_count)` tuples.
[210](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:210)
[211](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:211) Parameters
(...)
[238](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:238)
[239](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:239) """
[240](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:240) if isinstance(document, str):
--> [241](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:241) raise TypeError("doc2bow expects an array of unicode tokens on input, not a single string")
[243](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:243) # Construct (word, frequency) mapping.
[244](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/octis/lib/python3.10/site-packages/gensim/corpora/dictionary.py:244) counter = defaultdict(int)
TypeError: doc2bow expects an array of unicode tokens on input, not a single string
Description
I'm trying to test to optimizer described in your docs, and following the steps exactly (except for changing
dataset.load
todataset.fetch_dataset
) but I get the following errorTypeError: doc2bow expects an array of unicode tokens on input, not a single string
What I Did