File /usr/local/lib/python3.9/dist-packages/octis/models/pytorchavitm/AVITM.py:77, in AVITM.train_model(self, dataset, hyperparameters, top_words)
74 self.set_params(hyperparameters)
76 if self.use_partitions:
---> 77 train, validation, test = dataset.get_partitioned_corpus(use_validation=True)
79 data_corpus_train = [' '.join(i) for i in train]
80 data_corpus_test = [' '.join(i) for i in test]
TypeError: cannot unpack non-iterable NoneType object
What I Did
Here's the code for creating the custom dataset from a list of strings...
# docs is a list of strings
# collect tokens
tokens = []
for d in tqdm(docs):
tokens += word_tokenize(d.lower())
# write vocab file
with open("octis_dataset/vocabulary.txt", "w+") as f:
for s in tqdm(set(tokens)):
f.write(s + "\n")
# create corpus tsv
df = pd.DataFrame(docs)
# partition
tr_data = df.sample(48500, random_state=420)
te_data = df.query("index not in @tr_data.index").sample(12900, random_state=420)
val_data = df.query("index not in @tr_data.index and index not in @te_data.index")
df = pd.concat([tr_data, te_data, val_data])
# write tsv
df.to_csv("octis_dataset/corpus.tsv", sep="\t", header=None)
And here is the code to optimize the model...
optimizer=Optimizer()
start = time.time()
optimization_result = optimizer.optimize(
model, dataset, coherence, search_space, number_of_call=optimization_runs,
model_runs=model_runs, save_models=True,
extra_metrics=None, # to keep track of other metrics
save_path='results/test_neuralLDA/'
)
end = time.time()
duration = end - start
optimization_result.save_to_csv("results_neuralLDA.csv")
print('Optimizing model took: ' + str(round(duration)) + ' seconds.')
Description
Trying to run Optimization, following this tutorial on custom dataset raises:
What I Did
Here's the code for creating the custom dataset from a list of strings...
And here is the code to optimize the model...
And this raises the error.