explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.04k stars 712 forks source link

[RFC] Testset Generation: making it faster and easy to use #380

Closed jjmachan closed 8 months ago

jjmachan commented 10 months ago

What is this about?

We have had Synthetic Test Data generation in beta for a while and many of you have given us valuable feedback on the same. Now we are reworking it to be faster and extensible for a wider use.

Ragas takes a novel approach to evaluation data generation. An ideal evaluation dataset should encompass various types of questions encountered in production, including questions of varying difficulty levels. LLMs by default are not good at creating diverse samples as it tends to follow common paths. Inspired by works like Evol-Instruct, Ragas achieves this by employing an evolutionary generation paradigm, where questions with different characteristics such as reasoning, conditioning, multi-context, and more are systematically crafted from the provided set of documents. This approach ensures comprehensive coverage of the performance of various components within your pipeline, resulting in a more robust evaluation process. image

Core Components

  1. Evolutions - this is the core and defines how to evolve the given (context, question) pair into more complex questions - adding more context if needed.
  2. TestsetGenerator - this takes the LLM, evolutions, Documents and other configurations and returns the generated testset. This class is also responsible for scheduling the different runs in parallel for max throughput.
  3. DocumentStore and Document - Document is a extension of langchain_core's Document abstraction. DocumentStore is responsible for connecting with the available documents and giving Evolutions and interface to fetch documents (adjacent and similar) as needed.
  4. Filter - filters critique the output from the evolutions and decides if it should be accepted or not. The Evolution decides how to evolve the (context, question) and Filter checks if it is acceptable or not.

    Usage

High Level

User can use it by importing the evolutions, defining the distribution of the evolutions in the final testset and configuring TestsetGenerator.

from ragas.testset.evolutions import simple, reasoning, multihop, BaseEvolution
from ragas import TestsetGenerator

# define evolutions you will need
evolutions = {
  simple: 0.4,
  reasoning: 0.4,
  multihop: 0.2
}

generator = TestsetGenerator(
    generator_llm: RagasLLM,
    critic_llm: RagasLLM,
    embeddings_model: Embeddings,   
) -> TestsetGenerator

testset = generator.generate(
        documents: Documents = docs,
        doc_store: t.Optional[DocumentStore] # not going to do now
        evolutions: dict[Evolution, float],
        test_size: int,
)
testset_df = testset.to_pandas()

# with openai
generator = TestsetGenerator.with_openai(       
    generator_llm: str = "gpt3.5",
    critic_llm: str = "gpt4",
    embeddings_model: Embeddings,   
)
testset = generator.generate(
        documents: Documents = docs,
        doc_store: t.Optional[DocumentStore] # not going to do now
        evolutions: dict[Evolution, float],
        test_size: int,
)
testset_df = testset.to_pandas()

Your own Evolutions and Filters

If you want to create a new Evolutions, you will have to sub-class the BaseEvolution and create subclass of BaseFilter.

# base filter
@dataclass
class BaseFilter
    llm: RagasLLM
    filter_prompt; 
    def __call__() -> Bool:

@dataclass
class BaseEvolution(ABC):
    llm: RagasLLM
    filters: list[Callable]
    docstore: DocumentStore
    @absctract_method
    def evolve():
    async def aevolve():
        raise NotImplemented

Document Storage

By default there will be an InMemoryDocStore but you can also connect it with other databases by extending the BaseDocumentStore class

class Document(LCDocument):
    doc_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    filename: t.Optional[str] = None
    embedding: t.Optional[t.List[float]] = Field(default=None, repr=False)

class DocumentStore(ABC):
    def __init__(self):
        self.documents = {}

    @abstractmethod
    def add(self, docs: t.List[Document], show_progress: bool = False):
        ...

    @abstractmethod
    def get(self, doc_id: int) -> Document:
        ...

    @abstractmethod
    def get_similar(
        self, doc: Document, threshold: float = 0.7, top_k: int = 3
    ) -> t.List[Document]:
        ...

    @abstractmethod
    def get_adjascent(self, doc: Document, direction: str = "next") -> t.List[Document]:
        ...

Issues this will fix

babysor commented 10 months ago

Could you also allow it to process parallelly?

jjmachan commented 10 months ago

yes @babysor that would be there. The ideas is that if say that you need 100 dataset examples each of those 100 items will be created in parallel - either with async or in threads

jjmachan commented 9 months ago

related issues to solve

jjmachan commented 8 months ago

finished with the release of v0.1 :)

hodgesz commented 8 months ago

Awesome, thanks so much!

On Wed, Feb 7, 2024 at 11:25 AM Jithin James @.***> wrote:

finished with the release of v0.1 :)

— Reply to this email directly, view it on GitHub https://github.com/explodinggradients/ragas/issues/380#issuecomment-1932629837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANO4ZMVF2LVKBJZXIQHVA3YSPBKNAVCNFSM6AAAAABAUHQQT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGYZDSOBTG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jjmachan commented 8 months ago

will close the rest of the related issues too - most have been fixed the new version