Open Gautam-Rajeev opened 8 months ago
Hi @ChakshuGautam . There can be 2 approaches to solve this, the first one involving LLMs, langchains and things like OpenAI call agents but I feel this might be an overkill for now. The second one which I was thinking might be more suitable at the moment. It involves first
Would like to discuss more and get your opinions.
Goal:
Develop a method to generate pairs of questions and relevant content strings from a given dataset, aimed at enhancing retrieval testing. The relevant content should consist of collections of sentences from the source material that are necessary to derive the final answer, rather than being direct answers or chunks themselves. This approach will facilitate better retrieval testing, including that of chunking strategies
Description
For effective retrieval testing, you need question-content pairs where the content is not simply the answer or a chunk of text directly related to the question. Instead, the content should be a curated collection of sentences from various parts of the provided text. These sentences should collectively contain the necessary information to answer the question, making the retrieval challenge more complex and representative of real-world scenarios. The goal is to assess retrieval performance by determining if these critical sentences are included in the chunks retrieved by the system.
Key requirements include:
Implementation Details
The project will involve:
Open for collaboration: This project is initially unassigned and open to anyone interested. Discussion and solution proposals can be exchanged in comments. Contributors with impactful pull requests may be considered for assignment.
Product Name
retrieval testing
Organization Name
Samagra
Domain
Data Science / Machine Learning
Tech Skills Needed
Category
Feature
Question-Content Pair Generation
Mentor(s)
@ChakshuGautam
Complexity
Medium