Open danielesalvatore opened 2 months ago
Hey @danielesalvatore! Welcome to the repository! I'm here to help you with any bugs, questions, or contributions while you're waiting for a human maintainer. If you need any assistance, feel free to ask.
Yes, Ragas can be used to evaluate your Amazon Bedrock Agent setup with a managed Knowledge Base. The provided documentation includes a detailed example of how to use Ragas for evaluation with Amazon Bedrock.
Here is a relevant example from the documentation:
# Import necessary libraries
from datasets import load_dataset
from ragas.metrics import (
context_precision,
faithfulness,
context_recall,
)
from ragas.metrics.critique import harmfulness
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings
from ragas import evaluate
import nest_asyncio
# Load sample dataset
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
# Define metrics
metrics = [
faithfulness,
context_recall,
context_precision,
harmfulness,
]
# Configure Bedrock model
config = {
"credentials_profile_name": "your-profile-name", # E.g "default"
"region_name": "your-region-name", # E.g. "us-east-1"
"model_id": "your-model-id", # E.g "anthropic.claude-v2"
"model_kwargs": {"temperature": 0.4},
}
bedrock_model = BedrockChat(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
model_id=config["model_id"],
model_kwargs=config["model_kwargs"],
)
# Initialize embeddings
bedrock_embeddings = BedrockEmbeddings(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
)
# Apply nest_asyncio if running in a Jupyter notebook
nest_asyncio.apply()
# Evaluate the model
result = evaluate(
amnesty_qa["eval"].select(range(3)),
metrics=metrics,
llm=bedrock_model,
embeddings=bedrock_embeddings,
)
# Convert results to pandas DataFrame
df = result.to_pandas()
df.head()
This example demonstrates how to load a dataset, configure the Bedrock model, define evaluation metrics, and run the evaluation using Ragas [1].
Additionally, Ragas provides tools and techniques to leverage continual learning in RAG applications. It allows you to synthetically generate diverse test datasets, use LLM-assisted evaluation metrics, monitor the quality of your apps in production, and use insights to iterate and improve your application. For more detailed information, you can refer to the Core Concepts section in the Ragas documentation [2].
For more community insights and examples, you can explore the Community section of the Ragas documentation, which features various blog posts, tutorials, and case studies on how different teams and individuals are using Ragas to enhance their RAG systems [3].
To continue talking to Dosu, mention @dosu.
@dosu
Can you give an example that strictly includes the following requirements:
If you cannot provide such an example, explain in depth the implications in terms of test validity of using BedrockChat and BedrockEmbeddings instead of the actual infrastructure based on Amazon Bedrock Agents and Knowledge Bases. Thanks!
To integrate Ragas with Amazon Bedrock Agent and Amazon Bedrock Knowledge Bases, and use a PDF as a testing dataset, follow these steps:
Install Required Packages:
!pip install ragas langchain_openai python-dotenv
Set Up Environment and Load API Key:
import nest_asyncio
import os
import openai
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Access the API key from the environment variable
api_key = os.environ.get("OPENAI_API_KEY")
# Initialize the OpenAI API client
openai.api_key = api_key
nest_asyncio.apply()
Load PDF Dataset:
from pathlib import Path
from ragas_loader import RAGASLoader
# Initialize the loader with the path to your PDF file or directory containing PDFs
loader = RAGASLoader(file_path=Path("path/to/your/pdf_or_directory"), mode="single")
# Load the documents
documents = list(loader.lazy_load())
# If you prefer asynchronous loading
# documents = [doc async for doc in loader.lazy_aload()]
Configure Amazon Bedrock Agent and Knowledge Bases:
from langchain_community.retrievers import AmazonKnowledgeBasesRetriever
from langchain_community.chat_models import BedrockChat
config = {
"credentials_profile_name": "your-profile-name", # E.g "default"
"region_name": "your-region-name", # E.g. "us-east-1"
"model_id": "your-model-id", # E.g "anthropic.claude-v2"
"model_kwargs": {"temperature": 0.4},
}
bedrock_model = BedrockChat(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
model_id=config["model_id"],
model_kwargs=config["model_kwargs"],
)
retriever = AmazonKnowledgeBasesRetriever(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
index_name="your-index-name", # The name of your OpenSearch index
embedding_model="titan-embed-text-v1",
)
Define Evaluation Metrics:
from ragas.metrics import (
context_precision,
faithfulness,
context_recall,
)
from ragas.metrics.critique import harmfulness
metrics = [
faithfulness,
context_recall,
context_precision,
harmfulness,
]
Evaluate the Model:
from ragas import evaluate
result = evaluate(
documents, # Use the loaded PDF documents
metrics=metrics,
llm=bedrock_model,
retriever=retriever,
)
# Convert results to pandas DataFrame
df = result.to_pandas()
df.head()
Using BedrockChat
and BedrockEmbeddings
instead of Amazon Bedrock Agents and Knowledge Bases can affect the validity of your tests in several ways:
Contextual Relevance: Bedrock Agents and Knowledge Bases are designed to work together, leveraging the managed knowledge base for more accurate and contextually relevant responses. Using BedrockChat
and BedrockEmbeddings
might not fully utilize the knowledge base, leading to less accurate results.
Performance: The managed Knowledge Base in Amazon Bedrock is optimized for retrieval tasks, potentially offering better performance and more relevant document retrieval compared to a generic embedding model.
Integration: The integration between Bedrock Agents and Knowledge Bases is seamless, ensuring that the retrieval and generation processes are tightly coupled. Using separate components like BedrockChat
and BedrockEmbeddings
might introduce inconsistencies or inefficiencies in the workflow.
Evaluation Metrics: The evaluation metrics might not fully capture the capabilities of the managed Knowledge Base if the retrieval process is not aligned with how the Knowledge Base is intended to be used.
By following the steps provided, you can ensure that your evaluation setup leverages the full capabilities of Amazon Bedrock Agents and Knowledge Bases, leading to more accurate and relevant test results.
To continue talking to Dosu, mention @dosu.
Hey @danielesalvatore , we are working on supporting conversational and agentic workflows in ragas by v0.2. I'm working on the initial PR #1222 . I would would to talk to you to further understand your use-case and ensure that we support it too. If you're free, here's my cal
Hey @shahules786, thanks for pinging! I just submitted a meeting request for Monday, September 2, 2024, 10:00 – 10:30 Rome, ITA time.
[X] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question what is unclear to you? What would you like to know?
I just discovered the MDD approach for RAG development implemented by Ragas, and it is fascinating.
I have a conversational chatbot based on Amazon Bedrock Agent that uses
claude-3-haiku-20240307-v1
as LLM, configured to use a Bedrock Knowledge Base (AWS managed). The KB is built on top of Amazon OpenSearch Service Serverless withtitan-embed-text-v1
as the embedding model (chunking setup: fixed length size, max 300 tokens, 20% overlap).Would Ragas be able to be used to evaluate my RAG setup? If so, do you have any examples, or can you point me to relevant documentation/approaches?
Thank you!
Code Examples The chatbot infrastructure is scripted using AWS CD (python). I can provide relevant snippets if required.
Additional context Anything else you want to share with us? I opted for this setup since AWS manages every service.