explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.33k stars 746 forks source link

Can Ragas be used to evaluate Amazon Bedrock Agents w/ managed Knowledge Base? #1234

Open danielesalvatore opened 2 months ago

danielesalvatore commented 2 months ago

[X] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question what is unclear to you? What would you like to know?

I just discovered the MDD approach for RAG development implemented by Ragas, and it is fascinating.

I have a conversational chatbot based on Amazon Bedrock Agent that uses claude-3-haiku-20240307-v1 as LLM, configured to use a Bedrock Knowledge Base (AWS managed). The KB is built on top of Amazon OpenSearch Service Serverless with titan-embed-text-v1 as the embedding model (chunking setup: fixed length size, max 300 tokens, 20% overlap).

Would Ragas be able to be used to evaluate my RAG setup? If so, do you have any examples, or can you point me to relevant documentation/approaches?

Thank you!

Code Examples The chatbot infrastructure is scripted using AWS CD (python). I can provide relevant snippets if required.

Additional context Anything else you want to share with us? I opted for this setup since AWS manages every service.

dosubot[bot] commented 2 months ago

Hey @danielesalvatore! Welcome to the repository! I'm here to help you with any bugs, questions, or contributions while you're waiting for a human maintainer. If you need any assistance, feel free to ask.

Yes, Ragas can be used to evaluate your Amazon Bedrock Agent setup with a managed Knowledge Base. The provided documentation includes a detailed example of how to use Ragas for evaluation with Amazon Bedrock.

Here is a relevant example from the documentation:

# Import necessary libraries
from datasets import load_dataset
from ragas.metrics import (
    context_precision,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings
from ragas import evaluate
import nest_asyncio

# Load sample dataset
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")

# Define metrics
metrics = [
    faithfulness,
    context_recall,
    context_precision,
    harmfulness,
]

# Configure Bedrock model
config = {
    "credentials_profile_name": "your-profile-name",  # E.g "default"
    "region_name": "your-region-name",  # E.g. "us-east-1"
    "model_id": "your-model-id",  # E.g "anthropic.claude-v2"
    "model_kwargs": {"temperature": 0.4},
}

bedrock_model = BedrockChat(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
    endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
    model_id=config["model_id"],
    model_kwargs=config["model_kwargs"],
)

# Initialize embeddings
bedrock_embeddings = BedrockEmbeddings(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
)

# Apply nest_asyncio if running in a Jupyter notebook
nest_asyncio.apply()

# Evaluate the model
result = evaluate(
    amnesty_qa["eval"].select(range(3)),
    metrics=metrics,
    llm=bedrock_model,
    embeddings=bedrock_embeddings,
)

# Convert results to pandas DataFrame
df = result.to_pandas()
df.head()

This example demonstrates how to load a dataset, configure the Bedrock model, define evaluation metrics, and run the evaluation using Ragas [1].

Additionally, Ragas provides tools and techniques to leverage continual learning in RAG applications. It allows you to synthetically generate diverse test datasets, use LLM-assisted evaluation metrics, monitor the quality of your apps in production, and use insights to iterate and improve your application. For more detailed information, you can refer to the Core Concepts section in the Ragas documentation [2].

For more community insights and examples, you can explore the Community section of the Ragas documentation, which features various blog posts, tutorials, and case studies on how different teams and individuals are using Ragas to enhance their RAG systems [3].

To continue talking to Dosu, mention @dosu.

danielesalvatore commented 2 months ago

@dosu

Can you give an example that strictly includes the following requirements:

If you cannot provide such an example, explain in depth the implications in terms of test validity of using BedrockChat and BedrockEmbeddings instead of the actual infrastructure based on Amazon Bedrock Agents and Knowledge Bases. Thanks!

dosubot[bot] commented 2 months ago

To integrate Ragas with Amazon Bedrock Agent and Amazon Bedrock Knowledge Bases, and use a PDF as a testing dataset, follow these steps:

  1. Install Required Packages:

    !pip install ragas langchain_openai python-dotenv
  2. Set Up Environment and Load API Key:

    import nest_asyncio
    import os
    import openai
    from dotenv import load_dotenv
    
    # Load environment variables from .env file
    load_dotenv()
    # Access the API key from the environment variable
    api_key = os.environ.get("OPENAI_API_KEY")
    
    # Initialize the OpenAI API client
    openai.api_key = api_key
    
    nest_asyncio.apply()
  3. Load PDF Dataset:

    from pathlib import Path
    from ragas_loader import RAGASLoader
    
    # Initialize the loader with the path to your PDF file or directory containing PDFs
    loader = RAGASLoader(file_path=Path("path/to/your/pdf_or_directory"), mode="single")
    
    # Load the documents
    documents = list(loader.lazy_load())
    
    # If you prefer asynchronous loading
    # documents = [doc async for doc in loader.lazy_aload()]
  4. Configure Amazon Bedrock Agent and Knowledge Bases:

    from langchain_community.retrievers import AmazonKnowledgeBasesRetriever
    from langchain_community.chat_models import BedrockChat
    
    config = {
        "credentials_profile_name": "your-profile-name",  # E.g "default"
        "region_name": "your-region-name",  # E.g. "us-east-1"
        "model_id": "your-model-id",  # E.g "anthropic.claude-v2"
        "model_kwargs": {"temperature": 0.4},
    }
    
    bedrock_model = BedrockChat(
        credentials_profile_name=config["credentials_profile_name"],
        region_name=config["region_name"],
        endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
        model_id=config["model_id"],
        model_kwargs=config["model_kwargs"],
    )
    
    retriever = AmazonKnowledgeBasesRetriever(
        credentials_profile_name=config["credentials_profile_name"],
        region_name=config["region_name"],
        index_name="your-index-name",  # The name of your OpenSearch index
        embedding_model="titan-embed-text-v1",
    )
  5. Define Evaluation Metrics:

    from ragas.metrics import (
        context_precision,
        faithfulness,
        context_recall,
    )
    from ragas.metrics.critique import harmfulness
    
    metrics = [
        faithfulness,
        context_recall,
        context_precision,
        harmfulness,
    ]
  6. Evaluate the Model:

    from ragas import evaluate
    
    result = evaluate(
        documents,  # Use the loaded PDF documents
        metrics=metrics,
        llm=bedrock_model,
        retriever=retriever,
    )
    
    # Convert results to pandas DataFrame
    df = result.to_pandas()
    df.head()

Implications of Using BedrockChat and BedrockEmbeddings Instead of Amazon Bedrock Agents and Knowledge Bases

Using BedrockChat and BedrockEmbeddings instead of Amazon Bedrock Agents and Knowledge Bases can affect the validity of your tests in several ways:

  1. Contextual Relevance: Bedrock Agents and Knowledge Bases are designed to work together, leveraging the managed knowledge base for more accurate and contextually relevant responses. Using BedrockChat and BedrockEmbeddings might not fully utilize the knowledge base, leading to less accurate results.

  2. Performance: The managed Knowledge Base in Amazon Bedrock is optimized for retrieval tasks, potentially offering better performance and more relevant document retrieval compared to a generic embedding model.

  3. Integration: The integration between Bedrock Agents and Knowledge Bases is seamless, ensuring that the retrieval and generation processes are tightly coupled. Using separate components like BedrockChat and BedrockEmbeddings might introduce inconsistencies or inefficiencies in the workflow.

  4. Evaluation Metrics: The evaluation metrics might not fully capture the capabilities of the managed Knowledge Base if the retrieval process is not aligned with how the Knowledge Base is intended to be used.

By following the steps provided, you can ensure that your evaluation setup leverages the full capabilities of Amazon Bedrock Agents and Knowledge Bases, leading to more accurate and relevant test results.

To continue talking to Dosu, mention @dosu.

shahules786 commented 2 months ago

Hey @danielesalvatore , we are working on supporting conversational and agentic workflows in ragas by v0.2. I'm working on the initial PR #1222 . I would would to talk to you to further understand your use-case and ensure that we support it too. If you're free, here's my cal

danielesalvatore commented 2 months ago

Hey @shahules786, thanks for pinging! I just submitted a meeting request for Monday, September 2, 2024, 10:00 – 10:30 Rome, ITA time.