Mapping between vocabulary and columns in topic-word-matrix

OCTIS version: 1.10.3
Python version: 3.8.10
Operating System: Ubuntu 20.04.3 LTS

Description

I want to take search query from a user and based on this query, return a list of top 5 topics(out of 50 generated after running the LDA model) which match this query.

What I Did

For this task, I made an all zero list of size len(vocabulary.txt) and made the indices corresponding to the search query as 1, i.e

search_vec = [0]*len(vocabulary)
for word in query:
       if word in vocabulary:
           idx = vocabulary.index(word)
           search_vec[idx] = 1
# N-hot encoding complete

I later ran some Nearest Neighbor functions using topic-words-matrix as original data while search_vec as my query vector. The problem here is, as I figured out, the ordering of words in vocabulary list and that used to create the topic-word-matrix are not the same.

How do I get that ordering? Is there any method to give me the index of word in vocabulary which was used as a column in the topic-word-matrix?

MIND-Lab / OCTIS

Mapping between vocabulary and columns in topic-word-matrix #73

Description

What I Did