Closed rjsu26 closed 1 year ago
Hello,
when you train a topic model, you initialize the dataset first. This dataset has a vocabulary (the indices correspond to the vocabulary of topic-words-matrix
). You can get it in the following way:
dataset = Dataset()
dataset.load_custom_dataset_from_folder("dataset_folder") # or your preferred way to initialize the dataset
vocabulary = dataset.get_vocabulary()
Hope this helped. Thanks for your patience,
Silvia
Description
I want to take search query from a user and based on this query, return a list of top 5 topics(out of 50 generated after running the LDA model) which match this query.
What I Did
For this task, I made an all zero list of size len(vocabulary.txt) and made the indices corresponding to the search query as 1, i.e
I later ran some Nearest Neighbor functions using
topic-words-matrix
as original data whilesearch_vec
as my query vector. The problem here is, as I figured out, the ordering of words in vocabulary list and that used to create thetopic-word-matrix
are not the same.How do I get that ordering? Is there any method to give me the index of word in vocabulary which was used as a column in the
topic-word-matrix
?