OpenPecha / rag_prep_tool

MIT License
0 stars 0 forks source link

RAG0004: Query and retrieval of the embedded chunks (3) #6

Closed tenzin3 closed 5 months ago

tenzin3 commented 6 months ago

Description:

Querying the vector database with a given user input and retrieving the top-k chunks. The number of chunks to be retrieved depends on the chunking method and is subject to change. If the majority of the retrieved chunks are not relevant, an additional reranker model needs to be added.

Excepted Output:

A number of relevant text chunks that will be forwarded to the prompt template of the LLM.

Implementation Plan

Image

tenzin3 commented 5 months ago

Model arguments

max number of new tokens = 500 temperature = 0

microsoft/Phi-3-mini-128k-instruct model

response_time is in seconds Image

The Phi-3-mini-128k language model was selected, and using a context number of 5 has proven to provide better faithfulness and relevancy. However, if the additional steps such as query and context transformation lead to significant time increases, opting for a smaller number of contexts could also be considered