MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
734 stars 106 forks source link

'OptimizerEvaluation' object has no attribute 'dict_model_runs' error while using 'extra_metrics' in optimization #47

Closed Akshayextreme closed 2 years ago

Akshayextreme commented 2 years ago

Description

I am trying to optimise LDA model with custom data. My evaluation metric is npmi but I am also using topic_diversity as extra metric during optimization.

What I Did

Code:

# Create Model
model = LDA(num_topics=20, alpha=0.1)
model.partitioning(False)

# Initialize metric
npmi = Coherence(texts=dataset.get_corpus(), topk=10, measure='c_npmi')

# Initialize metric
topic_diversity = TopicDiversity(topk=10)

optimization_runs=30 # number of optimization iterations
model_runs=5 # number of runs of the topic model

# Define the search space. To see which hyperparameters to optimize, see the topic model's initialization signature
search_space = {"alpha": Real(low=0.001, high=5.0), 
                "eta": Real(low=0.001, high=5.0), 
                'num_topics': Integer(low=1, high=10, prior='uniform')}

# Initialize an optimizer object and start the optimization.
optimizer=Optimizer()
optResult=optimizer.optimize(model, dataset, 
                             search_space= search_space, 
                             save_path=output_path, # path to store the results
                             metric= npmi,
                             number_of_call=optimization_runs,
                             model_runs=model_runs, 
                             extra_metrics=[topic_diversity])

#save the results of th optimization in a csv file
optResult.save_to_csv(results.csv")

Error traceback :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/envs/topic_modeling/lib/python3.8/site-packages/octis/optimization/optimizer_evaluation.py in save_to_csv(self, name_file)
    150             try:
--> 151                 df[metric.info()["name"] + '(not optimized)'] = [np.median(
    152                     self.dict_model_runs[metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]

~/envs/topic_modeling/lib/python3.8/site-packages/octis/optimization/optimizer_evaluation.py in <listcomp>(.0)
    151                 df[metric.info()["name"] + '(not optimized)'] = [np.median(
--> 152                     self.dict_model_runs[metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]
    153             except:

AttributeError: 'OptimizerEvaluation' object has no attribute 'dict_model_runs'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_6863/802420513.py in <module>
     10 
     11 #save the results of th optimization in a csv file
---> 12 optResult.save_to_csv("results.csv")

~/envs/topic_modeling/lib/python3.8/site-packages/octis/optimization/optimizer_evaluation.py in save_to_csv(self, name_file)
    152                     self.dict_model_runs[metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]
    153             except:
--> 154                 df[metric.__class__.__name__ + '(not optimized)'] = [np.median(
    155                     self.dict_model_runs[metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]
    156 

~/envs/topic_modeling/lib/python3.8/site-packages/octis/optimization/optimizer_evaluation.py in <listcomp>(.0)
    153             except:
    154                 df[metric.__class__.__name__ + '(not optimized)'] = [np.median(
--> 155                     self.dict_model_runs[metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]
    156 
    157         if not name_file.endswith(".csv"):

AttributeError: 'OptimizerEvaluation' object has no attribute 'dict_model_runs'

Possible solution

Code modification here

Old

try:
    df[metric.info()["name"] + '(not optimized)'] = [np.median(
        self.dict_model_runs[metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]
except:
    df[metric.__class__.__name__ + '(not optimized)'] = [np.median(
        self.dict_model_runs[metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]

New

try:
  df[metric.info()["name"] + '(not optimized)'] = [np.median(
      self.info['dict_model_runs'][metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]
except:
  df[metric.__class__.__name__ + '(not optimized)'] = [np.median(
      self.info['dict_model_runs'][metric.__class__.__name__]['iteration_' + str(i)]) for i in range(n_row)]
silviatti commented 2 years ago

Hello, thank you for reporting this issue. Your solution is correct indeed :) I'm going to fix it and make the new release by tomorrow. Thanks,

Silvia