explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.55k stars 643 forks source link

[RFC] Executor: making Ragas faster and more reliable #394

Closed jjmachan closed 3 months ago

jjmachan commented 8 months ago

Problem - ragas is slow and unreliable

  1. Ragas is not exploiting concurrency options provided via ThreadPoolExecutor and asyncio modules. This is because ragas took a batching approach to evaluation ie evaluated metrics in batches
  2. Not every service has async support - need options to keep sync and no concurrency at all
  3. need these primitives for #380 and potentially others as well

Core Components

  1. BaseMetric - a metric that evaluates a single score row with but score() and ascore()
  2. RagasLLM that is based on langchain-core llms
    1. Prompt object with provision for instruction and demonstrations that convert to messages or prompts that is supported by both langchain chat based on completion based
    2. LLMResult object that supports both chat and text-based outputs
  3. Exector that runs BaseMetric . It should also be able to run testset generators so this should be a common paradigm
  4. new evaluate() function that makes it easier to
    1. change llm and embeddings - this is the new method where BaseMetrc by default will have llm=None and will take the default llm from the evaluate() function. If metric.llm != None then the provided metric is used
    2. switch between async vs threading
    3. supports callbacks throughout

Base classes

Metric

class Metric:
    def score(
      row, # just 1 row
      callbacks: t.Optional[Callbacks] = None,
    )-> float:
            ...
    async def ascore(
        row, # just 1 row
        callbacks: t.Optional[Callbacks] = None,
    )-> float:
        ...

evaluation()

def evaluate(
    dataset: Dataset,
    metrics: list[Metric] | None = None,
    llm: t.Optional[BaseRagasLLM] = None,
    embeddings: t.Optional[RagasEmbeddings] = None,
    callbacks: Callbacks = [],
    is_async: bool = True,
    max_workers: t.Optional[int] = None,
    raise_exceptions: bool = True,
    column_map: t.Dict[str, str] = {},
) -> Result:

BaseRagasLLM

@dataclass
class BaseRagasLLM(ABC):
    @abstractmethod
    def generate_text(
        self,
        prompt: Prompt,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        ...

    @abstractmethod
    async def agenerate_text(
        self,
        prompt: Prompt,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        ...
jjmachan commented 8 months ago

list of issues this will address

Make embeddings faster

iterakhtaras commented 5 months ago

Hey @jjmachan ! Thanks for all your work on ragas, I really appreciate it. I am trying to use it to evaluate my chatbot created with llama-index. Has there been any workarounds discovered for issue #271 ?

These are my dependencies: `%pip install ragas==0.0.22

%pip install pypdf

%pip install llama-index==0.8.52

%pip install langchain==0.0.331rc3

%pip install openai==0.28.1`