Closed tenzin3 closed 5 months ago
max number of new tokens = 500 temperature = 0
response_time is in seconds
The Phi-3-mini-128k language model was selected, and using a context number of 5 has proven to provide better faithfulness and relevancy. However, if the additional steps such as query and context transformation lead to significant time increases, opting for a smaller number of contexts could also be considered
Description:
Querying the vector database with a given user input and retrieving the top-k chunks. The number of chunks to be retrieved depends on the chunking method and is subject to change. If the majority of the retrieved chunks are not relevant, an additional reranker model needs to be added.
Excepted Output:
A number of relevant text chunks that will be forwarded to the prompt template of the LLM.
Implementation Plan