Closed jjmachan closed 8 months ago
Could you also allow it to process parallelly?
yes @babysor that would be there. The ideas is that if say that you need 100 dataset examples each of those 100 items will be created in parallel - either with async
or in threads
related issues to solve
finished with the release of v0.1 :)
Awesome, thanks so much!
On Wed, Feb 7, 2024 at 11:25 AM Jithin James @.***> wrote:
finished with the release of v0.1 :)
— Reply to this email directly, view it on GitHub https://github.com/explodinggradients/ragas/issues/380#issuecomment-1932629837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANO4ZMVF2LVKBJZXIQHVA3YSPBKNAVCNFSM6AAAAABAUHQQT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGYZDSOBTG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
will close the rest of the related issues too - most have been fixed the new version
What is this about?
We have had Synthetic Test Data generation in beta for a while and many of you have given us valuable feedback on the same. Now we are reworking it to be faster and extensible for a wider use.
Ragas takes a novel approach to evaluation data generation. An ideal evaluation dataset should encompass various types of questions encountered in production, including questions of varying difficulty levels. LLMs by default are not good at creating diverse samples as it tends to follow common paths. Inspired by works like Evol-Instruct, Ragas achieves this by employing an evolutionary generation paradigm, where questions with different characteristics such as reasoning, conditioning, multi-context, and more are systematically crafted from the provided set of documents. This approach ensures comprehensive coverage of the performance of various components within your pipeline, resulting in a more robust evaluation process.
Core Components
Evolutions
- this is the core and defines how to evolve the given (context, question) pair into more complex questions - adding more context if needed.TestsetGenerator
- this takes the LLM, evolutions, Documents and other configurations and returns the generated testset. This class is also responsible for scheduling the different runs in parallel for max throughput.DocumentStore
andDocument
-Document
is a extension of langchain_core's Document abstraction.DocumentStore
is responsible for connecting with the available documents and givingEvolutions
and interface to fetch documents (adjacent and similar) as needed.Filter
- filters critique the output from the evolutions and decides if it should be accepted or not. TheEvolution
decides how to evolve the (context, question) andFilter
checks if it is acceptable or not.Usage
High Level
User can use it by importing the evolutions, defining the distribution of the evolutions in the final testset and configuring
TestsetGenerator
.Your own
Evolution
s andFilter
sIf you want to create a new Evolutions, you will have to sub-class the
BaseEvolution
and create subclass ofBaseFilter
.Document Storage
By default there will be an
InMemoryDocStore
but you can also connect it with other databases by extending theBaseDocumentStore
classIssues this will fix