I have a dataset of about 2 million sentences. Each sentence has some metadata (for example "authority":1 or "subject": "liability of directors").
My goal is to perform an hybrid search where the corpus is filtered (for example, the retrieved documents must have subject "liability of directors") and then the similiarity search based on cosine distance is executed. Which may be the simpler/best approach?
I also tried a for loop where the whole corpus is scanned in each search and the similiarity search is performed only on the filtered items but i don't now if is a good approach.
I have a dataset of about 2 million sentences. Each sentence has some metadata (for example "authority":1 or "subject": "liability of directors"). My goal is to perform an hybrid search where the corpus is filtered (for example, the retrieved documents must have subject "liability of directors") and then the similiarity search based on cosine distance is executed. Which may be the simpler/best approach? I also tried a for loop where the whole corpus is scanned in each search and the similiarity search is performed only on the filtered items but i don't now if is a good approach.