explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.15k stars 727 forks source link

When i use the context recall metric and other retrieval metrics on my own embedding model, should I index the docs and renew the context retrieved by my model? #1124

Closed gcy0926 closed 3 months ago

gcy0926 commented 3 months ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Context I will use my own retrieval and rerank model, such fine-tuned bge-embedding and bge-reranker. And I want to test it on "explodinggradients/fiqa - ragas_eval", maybe i will expand it on my own dataset too.

Your Question

  1. When i use the context recall metric and other retrieval metrics on my own embedding model, should I index the docs and renew the context retrieved by my model?
  2. In the dataset, is there any question's answer depends on the context? Then if I need to renew the context, should I renew the ground truth too?
dosubot[bot] commented 3 months ago

@gcy0926 Hello there! I'm a bot designed to help you with bugs, questions, and contributions. I'm here to assist you while you wait for a human maintainer.

  1. Yes, you should index the documents and renew the context retrieved by your model when using the context recall metric and other retrieval metrics on your own embedding model [1].

  2. Yes, there are questions in the "explodinggradients/fiqa - ragas_eval" dataset whose answers depend on the context. Therefore, if you renew the context, you should also renew the ground truth [1].

To continue talking to Dosu, mention @dosu.

gcy0926 commented 3 months ago

@gcy0926 Hello there! I'm a bot designed to help you with bugs, questions, and contributions. I'm here to assist you while you wait for a human maintainer.

  1. Yes, you should index the documents and renew the context retrieved by your model when using the context recall metric and other retrieval metrics on your own embedding model [1].
  2. Yes, there are questions in the "explodinggradients/fiqa - ragas_eval" dataset whose answers depend on the context. Therefore, if you renew the context, you should also renew the ground truth [1].

To continue talking to Dosu, mention @dosu.

what's your suggestion on how to renew ground truth answer? using a larger LLM?

gcy0926 commented 3 months ago

@gcy0926 Hello there! I'm a bot designed to help you with bugs, questions, and contributions. I'm here to assist you while you wait for a human maintainer.

  1. Yes, you should index the documents and renew the context retrieved by your model when using the context recall metric and other retrieval metrics on your own embedding model [1].
  2. Yes, there are questions in the "explodinggradients/fiqa - ragas_eval" dataset whose answers depend on the context. Therefore, if you renew the context, you should also renew the ground truth [1].

To continue talking to Dosu, mention @dosu.

And another question, is the source doc for explodinggradients/fiqa ragas_eval in the corpus subset? which is 57.6k rows.

dosubot[bot] commented 3 months ago

To renew ground truth answers, using a larger LLM is indeed a good approach. The ragas framework supports this by allowing you to replace the default OpenAI models with custom LLMs and embeddings. You can pass custom LLMs and embeddings when instantiating the TestsetGenerator and use them in the evaluation process. Here’s an example:

from ragas.testset.generator import TestsetGenerator

# Initialize custom LLM and embeddings
from langchain_together import Together
from langchain_together.embeddings import TogetherEmbeddings

together_key = "<your-key-here>"

embeddings = TogetherEmbeddings(model="togethercomputer/m2-bert-80M-8k-retrieval")

together_completion = Together(
    model="NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT",
    temperature=0.7,
    max_tokens=4000,
    top_k=1,
    together_api_key=together_key
)

# Use custom LLM and embeddings in TestsetGenerator
generator = TestsetGenerator.from_langchain(
    generator_llm=together_completion,
    critic_llm=together_completion,
    embeddings=embeddings,
)

This setup allows you to leverage larger or more specialized LLMs for generating ground truth answers, ensuring the quality of your RAG pipeline [1].

Regarding your second question, to verify if the source documents for the "explodinggradients/fiqa - ragas_eval" dataset are included in the corpus subset of 57.6k rows, you can follow these steps:

  1. Load the Corpus Subset: Load the corpus subset of 57.6k rows into a DataFrame.
  2. Load the Source Documents: Load the source documents from the "explodinggradients/fiqa - ragas_eval" dataset.
  3. Check for Inclusion: Check if the source documents are included in the corpus subset by comparing the content or identifiers of the documents.

Here is a sample code to perform these steps:

import pandas as pd

# Load the corpus subset of 57.6k rows
corpus_df = pd.read_csv('path_to_corpus_subset.csv')

# Load the source documents from the "explodinggradients/fiqa - ragas_eval" dataset
source_documents = pd.read_csv('path_to_source_documents.csv')

# Check for inclusion
included = source_documents['document_id'].isin(corpus_df['document_id'])

# Verify if all source documents are included
all_included = included.all()

print(f"All source documents are included in the corpus subset: {all_included}")

Replace 'path_to_corpus_subset.csv' and 'path_to_source_documents.csv' with the actual paths to your datasets. The document_id should be replaced with the appropriate column that uniquely identifies each document in your datasets [2].

To continue talking to Dosu, mention @dosu.

github-actions[bot] commented 3 months ago

It seems the issue was answered, closing this now.