We might want to look into some new RAG chunk retrieval methods and evaluation metrics. This short tutorial covers some interesting topics. They use LlamaIndex for implementation but we shouldn't need to. Looks like TruEra might have some good objective metrics to evaluate chunks and answers generated from the context. Some notes & links to sample code.
Retrieval Methods:
Sentence-window retrieval: chunk by sentence, returns sentence before & after semantic match
Auto-merging retrieval: group set of relevant chunks into larger unit
Triad of metrics (to eval hyper-parameters & retrieval methods)
Answer Relevance: Is the response relevant to the query?
Context Relevance: Is the retrieved context relevant to the query?
Groundedness: Is the response supported by the context?
We might want to look into some new RAG chunk retrieval methods and evaluation metrics. This short tutorial covers some interesting topics. They use LlamaIndex for implementation but we shouldn't need to. Looks like TruEra might have some good objective metrics to evaluate chunks and answers generated from the context. Some notes & links to sample code.
Retrieval Methods:
Triad of metrics (to eval hyper-parameters & retrieval methods)