ai-cfia / llamaindex-db

Semantic search operations using llamaindex.
MIT License
1 stars 1 forks source link

Searching with exact matches like page titles doesn't always return the right result in `llamaindex-db` #26

Open k-allagbe opened 4 months ago

k-allagbe commented 4 months ago

Description

If a page is indexed and we search an exact match of a portion of it's content, we should expect to receive the page in the results. But we observe that it's not always the case.

image

Notebook: link

This is worth investigating.

ibrahim-kabir commented 4 months ago

@leejaeka any updates on this ?

leejaeka commented 4 months ago

Work In progress. Writing custom retriever for hybrid search solution.

leejaeka commented 4 months ago

Built a new keyword index. And a custom retriever for hybrid search. Hybrid search was not able to solve issue26 as it also failed to retrieve a query with exact title. Going to look a little more into this but issue26 may require cs solution to match title with query . Will discuss further with Guy once he is back. MicrosoftTeams-image (7)

leejaeka commented 3 months ago

2024-06-03 update

from llama_index.core.vector_stores import MetadataFilters from llama_index.core.vector_stores import ExactMatchFilter filters = MetadataFilters(filters=[ ExactMatchFilter( key="title", value='Audit of the Project Management of the Food Safety Action Plan - Canadian Food Inspection Agency' ), ])

and node = Document(text=curr['content'], metadata={'id_':curr['id'],'title':curr['title'], 'subtitle':curr['subtitle']})