explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.18k stars 732 forks source link

API KEY #1358

Open amin-kh96 opened 1 month ago

amin-kh96 commented 1 month ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question I wrote this code and I get the error:
The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable is it mandatory to set an API key to evaluate? if i have to use Api-key is there any free way to use it?

Code Examples

import json
import pandas as pd
from datasets import Dataset
from langchain.embeddings import HuggingFaceEmbeddings
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.metrics import faithfulness,context_utilization
from ragas import evaluate

# Load the ground_truthpip install --upgrade langchain ragas
file_path = 'assets\\GT.json'
with open(file_path) as f:
    ground_truth_data = json.load(f)

#load the question and the answer and the chunks
file_path = 'assets\\user_llm_interaction_embeddings_c1521dd5_b819_4241_b3a4_3e5c1388037c.json'
with open(file_path) as f:
     llm=json.load(f)
#Initialize an empty list to hold the new dataset
data_set = []

#Iterate through the list and combine every two dictionaries
for i in range(0, len(llm), 2):
    combined_dict = {
        "text_vector_1": llm[i].get("text_vector", []),
        "text_vector_2": llm[i + 1].get("text_vector", []),
        'chunks': llm[i + 1].get('chunks', [])
    }
    data_set.append(combined_dict)

def map_chunks(data_set, ground_truth_data):
    for item in data_set:  # Iterate over each dictionary in data_set
        c = []  # Reset c for each item
        for chunk_id in item['chunks']:  # Loop through 'chunks' in the current dictionary
            for element in ground_truth_data:  # Loop through ground_truth_data
                if element['id'] == chunk_id:  # Match chunk_id with element's id
                    c.append(element['text_vector'])  # Append the matching text_vector to c
        item['chunks'] = c  # Replace the original 'chunks' (ids) with the mapped text_vector values

    return data_set  # Return the updated data_set

data_set = map_chunks(data_set, ground_truth_data)

# Assuming data_set is a list of dictionaries
ragas_data = [
    {
        "question": entry["text_vector_1"],
        "answer": entry["text_vector_2"],
        "contexts": entry["chunks"]
    }
    for entry in data_set
]
# Convert list of dictionaries to dictionary of lists
ragas_data_dict = {key: [d[key] for d in ragas_data] for key in ragas_data[0]}

# Initialize Hugging Face embeddings
emb = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Wrap the embeddings
wrapped_embeddings = LangchainEmbeddingsWrapper(embeddings=emb)

# Assuming you have your dataset ready
dataset = ragas_data_dict  # Your dataset here

# Step 4: Define the list of evaluation metrics
metrics = [faithfulness,context_utilization]

# Step 5: Evaluate the data using the RAGAS `evaluate` function
evaluation_report = evaluate(dataset=dataset, metrics=metrics, embeddings=emb)
# Step 6: Display the evaluation results
print("RAGAS Evaluation Report:")
print(evaluation_report)

# Optional: Convert the evaluation report to a DataFrame for better readability
evaluation_df = pd.DataFrame(evaluation_report.items(), columns=['Metric', 'Score'])
print("\nEvaluation Report as DataFrame:")
print(evaluation_df)

Additional context I have the embeddings my dataset is not str

dosubot[bot] commented 1 month ago

Yes, it is mandatory to set an API key to evaluate using the evaluate function in RAGAS. The document states:

"Many metrics use OpenAI models and require an environment variable OPENAI_API_KEY to be set."

However, there is an alternative mentioned in the document. You can use the Opik platform, which provides a hosted version and also allows for local setup. This means you can run the Opik platform locally without needing an OpenAI API key. For more information on setting up Opik locally, refer to the installation guide.

Additionally, you can use Langfuse for online evaluation of your RAG pipelines. Langfuse allows you to score your traces and spans without needing ground-truth data, making it suitable for production environments. You can set up Langfuse with your own keys and use it to perform model-based evaluations. For more details, refer to the Langfuse documentation [1][2][3].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 month ago

but I am confused when I calculated the embeddings and also I did not set the LLM why do I have to use the API key? it is for embedding.

dosubot[bot] commented 1 month ago

You need to use an API key for embedding calculations because the MetricWithEmbeddings class requires valid embeddings to be provided. The init method of this class checks if the embeddings are valid and also verifies the API key for services like OpenAI and AzureOpenAI. If no valid embeddings are provided, it raises a ValueError.

@dataclass
class MetricWithEmbeddings(Metric):
    embeddings: t.Optional[BaseRagasEmbeddings] = None

    def init(self, run_config: RunConfig):
        """
        Init any models in the metric, this is invoked before evaluate()
        to load all the models
        Also check if the api key is valid for OpenAI and AzureOpenAI
        """
        if self.embeddings is None:
            raise ValueError(
                f"Metric '{self.name}' has no valid embeddings provided (self.embeddings is None). Please initantiate a the metric with an embeddings to run."  # noqa
            )
        self.embeddings.set_run_config(run_config)

This means that even if you did not set the LLM, you still need to provide valid embeddings and an API key for the embedding calculations to proceed. Ragas uses LLMs and Embeddings for both evaluation and test set generation, and by default, it uses OpenAI models. If you are using custom embeddings, they should be a subclass of BaseRagasEmbeddings, and you need to ensure that the API key is valid for the service you are using [1][2].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 month ago

I have another question, I have a vector database where PDFs and other kinds of documents are stored there. but they are split into chunks and they are vectorized. these files are very specialized for example chunks are explaining a specific error that is related to a specific medical device, should I use an LLM for ground truth in this case? or I should create a ground truth dataset? but even if I create a ground truth dataset I have to use the PDFs or documents that I have in my vector database and literally they are the same, help me with this

dosubot[bot] commented 1 month ago

You should create a ground truth dataset when your vector database contains specialized documents split into chunks and vectorized, such as those explaining specific errors related to medical devices. This ensures accurate and reliable evaluations, especially in specialized domains like medical devices [1].

The ground truth dataset should include the following columns:

  1. Question: A set of questions.
  2. Contexts: Retrieved contexts corresponding to each question. This is a list[list] since each question can retrieve multiple text chunks.
  3. Answer: Generated answer corresponding to each question.
  4. Ground truth: Ground truth corresponding to each question. This is a list[str] which corresponds to the expected answer for each question.

Here is an example of how to structure your dataset:

from datasets import Dataset 

data_samples = {
    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
    'answer': ['The first superbowl was held on January 15, 1967', 'The most super bowls have been won by The New England Patriots'],
    'contexts' : [['The Super Bowl....season since 1966,','replacing the NFL...in February.'], 
    ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
    'ground_truth': [['The first superbowl was held on January 15, 1967'], ['The New England Patriots have won the Super Bowl a record six times']]
}
dataset = Dataset.from_dict(data_samples)

The dataset should have the following structure:

Dataset[question: list[str], contexts: list[list[str]], answer: list[str], ground_truth: list[list[str]]]

This format ensures that the metrics can properly evaluate the RAG pipeline [2][3][4].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 month ago

no you are noticed my question. lets assume that I have dataset for rag evaluation, but I do not have the ground truth, how can I create or use GT

dosubot[bot] commented 1 month ago

If you do not have an existing ground truth (GT) dataset for RAG evaluation, you can use Ragas for reference-free evaluations. Ragas can perform evaluations on various aspects of your RAG pipeline without needing ground-truth data. This allows you to run evaluations on production traces collected with Langfuse.

Here is a brief overview of how you can set up and use Ragas for this purpose:

  1. Install Required Packages:

    %pip install datasets ragas llama_index python-dotenv --upgrade
  2. Load Your Dataset:

    from datasets import load_dataset
    
    amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")["eval"]
    amnesty_qa
  3. Define Metrics:

    from ragas.metrics import faithfulness, answer_relevancy, context_precision
    from ragas.metrics.critique import SUPPORTED_ASPECTS, harmfulness
    
    metrics = [faithfulness, answer_relevancy, context_precision, harmfulness]
  4. Initialize Metrics with LLMs and Embeddings:

    from ragas.run_config import RunConfig
    from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings
    from langchain_openai.chat_models import ChatOpenAI
    from langchain_openai.embeddings import OpenAIEmbeddings
    from ragas.llms import LangchainLLMWrapper
    from ragas.embeddings import LangchainEmbeddingsWrapper
    
    def init_ragas_metrics(metrics, llm, embedding):
        for metric in metrics:
            if isinstance(metric, MetricWithLLM):
                metric.llm = llm
            if isinstance(metric, MetricWithEmbeddings):
                metric.embeddings = embedding
            run_config = RunConfig()
            metric.init(run_config)
    
    llm = ChatOpenAI()
    emb = OpenAIEmbeddings()
    
    init_ragas_metrics(
        metrics,
        llm=LangchainLLMWrapper(llm),
        embedding=LangchainEmbeddingsWrapper(emb),
    )
  5. Score Your Traces:

    async def score_with_ragas(query, chunks, answer):
        scores = {}
        for m in metrics:
            print(f"calculating {m.name}")
            scores[m.name] = await m.ascore(
                row={"question": query, "contexts": chunks, "answer": answer}
            )
        return scores
    
    row = amnesty_qa[0]
    question, contexts, answer = row["question"], row["contexts"], row["answer"]
    await score_with_ragas(question, contexts, answer)

By following these steps, you can evaluate your RAG pipeline without needing a ground truth dataset [1].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 month ago

what metrics need GT

amin-kh96 commented 1 month ago

how can I make sure that my embeddings are a subclass of BaseRagasEmbeddings? is it better to let the ragas compute the embeddings from beginning?

dosubot[bot] commented 1 month ago

Metrics in Ragas that require ground truth (GT) include those that have the G - ground_truth: ground truth answer in their schema [1].

To ensure that your embeddings are a subclass of BaseRagasEmbeddings, you need to define your embeddings class to inherit from BaseRagasEmbeddings. Here is an example:

from ragas.embeddings import BaseRagasEmbeddings

class MyEmbeddings(BaseRagasEmbeddings):
    # Implement required methods and properties
    pass

It is generally better to let Ragas compute the embeddings from the beginning to ensure compatibility and proper initialization within the framework. This is because the MetricWithEmbeddings class checks if the embeddings are provided and initializes them accordingly [1].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 month ago

if we let ragas to compute the embeddings is it more costly than we define an class embedding that inherit from baseragasembeddings ? or there is no such difference

dosubot[bot] commented 1 month ago

Letting Ragas compute the embeddings using its default settings can be more costly compared to defining a custom class that inherits from BaseRagasEmbeddings, depending on the models and APIs used.

Ragas' default behavior uses the "BAAI/bge-small-en-v1.5" model from Huggingface, which can be cost-effective if you have the necessary infrastructure to run it locally. However, if you use external services like OpenAI for embedding computation, the costs can add up due to API usage fees.

By defining a custom class that inherits from BaseRagasEmbeddings, you can control the embedding process and potentially use more cost-effective or specialized models that suit your needs. This approach allows you to optimize for both performance and cost.

Here is an example of how to define a custom embedding class:

from ragas.embeddings import BaseRagasEmbeddings
from typing import List

class MyCustomEmbeddings(BaseRagasEmbeddings):
    def embed_query(self, text: str) -> List[float]:
        # Implement your custom embedding logic here
        pass

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        # Implement your custom embedding logic here
        pass

    async def aembed_query(self, text: str) -> List[float]:
        # Implement your custom embedding logic here
        pass

    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
        # Implement your custom embedding logic here
        pass

    def set_run_config(self, run_config: RunConfig):
        self.run_config = run_config

This way, you can use your own models and infrastructure, potentially reducing costs compared to using external APIs like OpenAI [1].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 month ago

give me the conclusion for this topic we are getting far from the real context. I know it is better to let the ragas do the embeddings but if i want to consider the cost or price for this is it better to use my own calculated embeddings and then in some ways or method make sure that my embeddings are suitable or compatible for using ragas,

jjmachan commented 1 month ago

@amin-kh96 where you able to solve this?

you're using faithfulness right? that metric requires an LLM and will be created for you automatically

amin-kh96 commented 1 month ago

I have a list like this: precomputed_embeddings = [ [-0.019296993, 0.034829292, 0.12389234], # Embedding 1 [-0.02389233, 0.104949302, 0.04929383], # Embedding 2 [-0.00283949, 0.00849322, 0.00938934] # Embedding 3 ]

and I want to use evaluate function from RAGAS. should I create a subclass of BaseRagasEmbeddings? If I want to do that I dont want to use the texts beacuse I dont have them, all I have now is these embeddings. I have an example of code which says use get_embedding but in the code it wants me to use the text, here is the code from chatGPT:

from ragas.embeddings import BaseRagasEmbeddings import numpy as np

class CustomListEmbeddings(BaseRagasEmbeddings): def init(self, precomputed_embeddings): """ :param precomputed_embeddings: A list of precomputed embeddings. """ self.precomputed_embeddings = precomputed_embeddings

def get_embedding(self, text: str) -> np.ndarray:
    """
    Return the corresponding embedding for a given text.
    For this example, we assume that the text serves as an index.
    """
    # In a real scenario, you need a mapping (index) between text and embeddings
    # Here we use a dummy approach: assume text is an index or id.
    index = int(text)  # Convert text to an integer index
    return np.array(self.precomputed_embeddings[index])

def get_embeddings(self, texts: list) -> np.ndarray:
    """
    Return embeddings for a list of texts.
    """
    embeddings = [self.get_embedding(text) for text in texts]
    return np.vstack(embeddings)  # Stack them into a 2D array

Example: List of precomputed embeddings (replace with your actual embeddings)

precomputed_embeddings = [ [-0.019296993, 0.034829292, 0.12389234], # Embedding 1 [-0.02389233, 0.104949302, 0.04929383], # Embedding 2 [-0.00283949, 0.00849322, 0.00938934] # Embedding 3 ]

Initialize the custom embeddings class with your list

custom_embeddings = CustomListEmbeddings(precomputed_embeddings)

Now you can retrieve embeddings by passing text as the index (or some identifier)

embedding = custom_embeddings.get_embedding("1") # Assuming "1" is the index for the second embedding print("Retrieved embedding:", embedding)

amin-kh96 commented 1 month ago

ok i did it but take a look at my code and my error: my code:

import json
import pandas as pd
import numpy as np
from datasets import Dataset
from langchain.embeddings import HuggingFaceEmbeddings
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.embeddings import BaseRagasEmbeddings
from ragas.metrics import faithfulness, context_utilization
from ragas import evaluate

# Load the ground truth data
file_path = 'assets\\GT.json'
with open(file_path) as f:
    ground_truth_data = json.load(f)

# Load the question and the answer and the chunks
file_path = 'assets\\user_llm_interaction_embeddings_c1521dd5_b819_4241_b3a4_3e5c1388037c.json'
with open(file_path) as f:
    llm = json.load(f)

# Initialize an empty list to hold the new dataset
data_set = []

# Iterate through the list and combine every two dictionaries
for i in range(0, len(llm), 2):
    combined_dict = {
        "text_vector_1": llm[i].get("text_vector", []),
        "text_vector_2": llm[i + 1].get("text_vector", []),
        'chunks': llm[i + 1].get('chunks', [])
    }
    data_set.append(combined_dict)

def map_chunks(data_set, ground_truth_data):
    for item in data_set:  # Iterate over each dictionary in data_set
        c = []  # Reset c for each item
        for chunk_id in item['chunks']:  # Loop through 'chunks' in the current dictionary
            for element in ground_truth_data:  # Loop through ground_truth_data
                if element['id'] == chunk_id:  # Match chunk_id with element's id
                    c.append(element['text_vector'])  # Append the matching text_vector to c
        item['chunks'] = c  # Replace the original 'chunks' (ids) with the mapped text_vector values

    return data_set  # Return the updated data_set

data_set = map_chunks(data_set, ground_truth_data)

# Assuming data_set is a list of dictionaries
ragas_data = [
    {
        "question": entry["text_vector_1"],  # Assuming this is a list of strings
        "answer": entry["text_vector_2"],  # Assuming this is a list of strings
        "contexts": entry["chunks"]  # Assuming this is a list of lists of strings
    }
    for entry in data_set
]

# Create the required structure for Dataset
formatted_data = {
    "question": [entry["question"] for entry in ragas_data],
    "contexts": [entry["contexts"] for entry in ragas_data],
    "answer": [entry["answer"] for entry in ragas_data]
}

# Convert to a Hugging Face Dataset
dataset = Dataset.from_dict(formatted_data)

# Verify the dataset structure
print(dataset)

# Creating a subclass which is compatible with BaseRagasEmbeddings
class CustomListEmbeddings(BaseRagasEmbeddings):
    def __init__(self, dataset):
        """
        :param dataset: A list of precomputed embeddings.
        """
        self.embedding_list = dataset

    def get_embedding(self, index: str) -> np.ndarray:
        """
        Return the corresponding embedding for a given index.
        Here we assume the input is just an index (or identifier) as a string.
        """
        index = int(index)  # Convert index to an integer
        return np.array(self.dataset[index])

    def get_embeddings(self, indices: list) -> np.ndarray:
        """
        Return embeddings for a list of indices.
        """
        embeddings = [self.get_embedding(index) for index in indices]
        return np.vstack(embeddings)  # Stack them into a 2D array

    def embed_documents(self, docs: list) -> np.ndarray:
        """
        Embeds a list of documents (represented as indices).
        """
        return self.get_embeddings(docs)

    def embed_query(self, query: str) -> np.ndarray:
        """
        Embeds a single query (represented as an index).
        """
        return self.get_embedding(query)

# Initialize your custom embeddings
custom_embeddings = CustomListEmbeddings(formatted_data["question"])

# Step 4: Define the list of evaluation metrics
metrics = [faithfulness, context_utilization]

# Step 5: Evaluate the data using the RAGAS `evaluate` function
evaluation_report = evaluate(dataset=dataset, metrics=metrics, embeddings=custom_embeddings)

# Step 6: Display the evaluation results
print("RAGAS Evaluation Report:")
print(evaluation_report)

# Optional: Convert the evaluation report to a DataFrame for better readability
evaluation_df = pd.DataFrame(evaluation_report.items(), columns=['Metric', 'Score'])
print("\nEvaluation Report as DataFrame:")
print(evaluation_df)

my error:
Exception has occurred: ValidationError
4608 validation errors for MultiTurnSample
user_input -> 0
  value is not a valid dict (type=type_error.dict)
user_input -> 0
  value is not a valid dict (type=type_error.dict)
user_input -> 0
  value is not a valid dict (type=type_error.dict)
user_input -> 1
  value is not a valid dict (type=type_error.dict)
user_input -> 1
  value is not a valid dict (type=type_error.dict)
user_input -> 1
  value is not a valid dict (type=type_error.dict)
user_input -> 2
  value is not a valid dict (type=type_error.dict)
user_input -> 2
  value is not a valid dict (type=type_error.dict)
aidanrussell commented 6 days ago

I was using Azure Open AI following this example and so I did not expect to have to set OPENAI_API_KEY as I expected to connect with AZURE_OPENAI_API_KEY?

https://docs.ragas.io/en/latest/getstarted/rag_evaluation/#__tabbed_1_3

jjmachan commented 5 days ago

@amin-kh96 that is a validation error from pydantic, use https://docs.ragas.io/en/latest/references/metrics/?h=single#ragas.metrics.base.SingleTurnMetric.single_turn_ascore to figure it out

@aidanrussell it is using azure openai key right? could you maybe share a screen shot of the problem, not able to quite understand the issue sadly

amin-kh96 commented 5 days ago

@dosu can the version of pydantic cause the error : ''fail to parse the output?'' the output parser in ragas use pydantic parser as I show it to you here. class RagasoutputParser(PydanticOutputParser): async def aparse( # type: ignore self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 1 ): try: output = super().parse(result) except OutputParserException: if max_retries != 0: p_value = FIX_OUTPUT_FORMAT.format( prompt=prompt.to_string(), completion=result ) output = await llm.generate(p_value) result = output.generations[0][0].text return await self.aparse(result, prompt, llm, max_retries - 1) else: logger.warning("Failed to parse output. Returning None.") return None return output

dosubot[bot] commented 5 days ago

Using your own precomputed embeddings can be more cost-effective than letting Ragas compute them, especially if you already have them available and they are compatible with Ragas. This avoids the need to compute new embeddings, which can be resource-intensive, particularly if using paid services like OpenAI [1].

To ensure your embeddings are compatible with Ragas, you should create a subclass of BaseRagasEmbeddings. This involves implementing the required methods for embedding operations, such as embed_query and embed_documents. Your implementation should handle the embeddings you have, even if they are precomputed and you don't have the original text data [2].

Regarding the error you're encountering, it seems to be related to the structure of your dataset or the way the embeddings are being accessed. Ensure that your dataset is structured correctly and that the indices or identifiers used to access embeddings are valid. The error might also be related to the version of Pydantic, as it plays a role in data validation and parsing within Ragas [3].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 4 days ago

@dosu how can I solve this : Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None.

dosubot[bot] commented 4 days ago

The issue "Failed to parse output. Returning None" is likely related to the RagasOutputParser and its parse_output_string method. This error occurs when the output string cannot be parsed into the expected format defined by the OutputModel.

To resolve this issue, you can follow these steps:

  1. Check the Output Format: Ensure that the output generated by the language model matches the expected JSON format defined by the OutputModel. The RagasOutputParser expects the output to comply with a specific schema.

  2. Handle Parsing Exceptions: Add detailed logging to understand why the parsing is failing. Log the output_string before parsing to see what the model is generating.

  3. Adjust the Prompt: If the output format is incorrect, consider adjusting the prompt or the input data to guide the language model to produce the desired output format. The PydanticPrompt class provides methods to generate instructions and examples that can help in guiding the model.

  4. Review the RagasOutputParser Implementation: Ensure that the RagasOutputParser is correctly implemented to handle the expected output format. If there are specific parsing rules or conditions, verify that they align with the output being generated. The parser includes a mechanism to retry parsing with a fixed output format if the initial parsing fails, which can be useful in resolving format issues.

By following these steps, you should be able to identify and resolve the parsing issue. Additionally, consider using the FixOutputFormat prompt to automatically correct output strings that do not meet the expected format [1][2].

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 4 days ago

@dosu what happen here that i see this error? "Failed to parse output. Returning None."

the code: class RagasoutputParser(PydanticOutputParser): async def aparse( # type: ignore self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 1 ): try: output = super().parse(result) except OutputParserException: if max_retries != 0: p_value = FIX_OUTPUT_FORMAT.format( prompt=prompt.to_string(), completion=result ) output = await llm.generate(p_value) result = output.generations[0][0].text return await self.aparse(result, prompt, llm, max_retries - 1) else: logger.warning("Failed to parse output. Returning None.") return None return output

dosubot[bot] commented 4 days ago

The error "Failed to parse output. Returning None" occurs when the RagasOutputParser is unable to parse the output string into the expected format. This typically happens when the output does not match the schema defined by the OutputModel.

Here are some steps to troubleshoot and resolve this issue:

  1. Verify Output Format: Ensure that the output from the language model is in the correct format expected by the OutputModel. The parser expects a specific structure, and any deviation can cause parsing to fail.

  2. Logging: Add logging to capture the result before parsing attempts. This will help you understand what the model is generating and why it might not be parsing correctly.

  3. Prompt Adjustments: If the output format is incorrect, consider modifying the prompt or input data to better guide the language model to produce the desired output format.

  4. Use FixOutputFormat: The RagasOutputParser includes a retry mechanism using the FixOutputFormat class to attempt to correct the output format. Ensure that this mechanism is correctly implemented and that the FIX_OUTPUT_FORMAT template is properly defined to guide the correction process [1].

  5. Review Parser Implementation: Double-check the implementation of the RagasOutputParser to ensure it aligns with the expected output format and that all necessary conditions for parsing are met.

By following these steps, you should be able to identify the root cause of the parsing failure and address it effectively.

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 day ago

@dosu is it possible that the error 'fail to parse the output' might be related to the dataset which is in Italian?

dosubot[bot] commented 1 day ago

The error "fail to parse the output" could potentially be related to the dataset being in Italian if the language model or the parsing logic is not properly handling multilingual data. Here are a few considerations:

  1. Language Handling: Ensure that the language model and the RagasOutputParser are configured to handle Italian text. If the model is not trained or configured for Italian, it might produce outputs that do not match the expected format.

  2. Prompt Adaptation: Use Ragas' language adaptation feature to adapt prompts to Italian. This involves translating few-shot examples to the target language while keeping instructions in English. This can help guide the model to produce outputs in the correct format for Italian text.

  3. Output Format: Verify that the output format expected by the OutputModel is language-agnostic. If there are language-specific constraints, ensure they are correctly defined for Italian.

  4. Character Encoding: Check for any character encoding issues that might arise from handling Italian text, especially if it includes special characters or accents.

If the issue persists, consider logging the outputs and any parsing errors to better understand how the Italian dataset might be affecting the parsing process.

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 day ago

@dosu I want to create a subclass of baseragasllm that instead of generating anything uses my dataset. so the point is I already have the question and the context and also the answer.

dosubot[bot] commented 1 day ago

To create a subclass of BaseRagasLLM that uses your existing dataset instead of generating new content, you can implement the required methods to return the pre-existing answers from your dataset. Here's a basic outline of how you can achieve this:

  1. Subclass BaseRagasLLM: Implement the generate_text and agenerate_text methods to return answers from your dataset based on the provided questions.

  2. Use a Mapping: Create a mapping from questions to answers in your dataset to facilitate quick retrieval.

Here's an example implementation:

from ragas.llms.base import BaseRagasLLM
from langchain_core.outputs import LLMResult
from langchain_core.prompt_values import PromptValue
from langchain_core.callbacks import Callbacks
import typing as t

class DatasetRagasLLM(BaseRagasLLM):
    def __init__(self, dataset):
        self.dataset = dataset  # Assume dataset is a dict mapping questions to answers

    def generate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        question = prompt.to_string()
        answer = self.dataset.get(question, "Answer not found")
        return LLMResult(generations=[[answer]])

    async def agenerate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: t.Optional[float] = None,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        # Asynchronous version of generate_text
        return self.generate_text(prompt, n, temperature, stop, callbacks)

# Example usage
dataset = {
    "What is the capital of France?": "Paris",
    "What is the largest planet?": "Jupiter",
    # Add more question-answer pairs
}

llm = DatasetRagasLLM(dataset)

This setup allows you to use your dataset directly for generating responses, bypassing the need for actual text generation. Ensure that your dataset is structured to efficiently map questions to answers.

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 day ago

@dosu does it support the context also? my dataset is in this format (list of dictionaries). new_ragas_dataset =[ { 'question' : , 'context' : , 'answer' : },{},...]

dosubot[bot] commented 1 day ago

Yes, you can support context in your subclass of BaseRagasLLM by including it in your dataset and using it when generating responses. Here's how you can modify the previous example to handle context:

  1. Include Context in Your Dataset: Ensure your dataset includes context along with questions and answers.

  2. Modify the generate_text Method: Use both the question and context to retrieve the answer.

Here's an updated implementation:

from ragas.llms.base import BaseRagasLLM
from langchain_core.outputs import LLMResult
from langchain_core.prompt_values import PromptValue
from langchain_core.callbacks import Callbacks
import typing as t

class DatasetRagasLLM(BaseRagasLLM):
    def __init__(self, dataset):
        self.dataset = dataset  # Assume dataset is a list of dicts with 'question', 'context', and 'answer'

    def generate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        question = prompt.to_string()
        # Find the entry in the dataset that matches the question
        entry = next((item for item in self.dataset if item['question'] == question), None)
        if entry:
            answer = entry['answer']
        else:
            answer = "Answer not found"
        return LLMResult(generations=[[answer]])

    async def agenerate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: t.Optional[float] = None,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        # Asynchronous version of generate_text
        return self.generate_text(prompt, n, temperature, stop, callbacks)

# Example usage
new_ragas_dataset = [
    {'question': 'What is the capital of France?', 'context': 'European capitals', 'answer': 'Paris'},
    {'question': 'What is the largest planet?', 'context': 'Solar system', 'answer': 'Jupiter'},
    # Add more entries
]

llm = DatasetRagasLLM(new_ragas_dataset)

This setup allows you to use both the question and context from your dataset to generate responses. Adjust the logic as needed to incorporate context into your response generation process.

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 day ago

@dosu I do not understand why I have everything in the right format and why I should retrieve anything again?

dosubot[bot] commented 1 day ago

If you already have your dataset in the correct format and it includes all necessary information (questions, contexts, and answers), you don't need to retrieve or generate anything again. The purpose of creating a subclass of BaseRagasLLM that uses your dataset is to allow the Ragas framework to interact with your pre-existing data as if it were generating responses. This is useful for evaluation purposes, where you want to simulate the behavior of a language model using your dataset.

The subclass simply provides a way to return the pre-existing answers from your dataset when the framework requests a response. This avoids unnecessary computation and leverages your existing data effectively. If your dataset is already structured correctly, you can directly use it with the subclass to provide responses without additional retrieval or generation steps.

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 day ago

@dosu ok based on what you said just give me the code.

dosubot[bot] commented 1 day ago

Here's the code to create a subclass of BaseRagasLLM that uses your existing dataset to provide answers based on questions and contexts:

from ragas.llms.base import BaseRagasLLM
from langchain_core.outputs import LLMResult
from langchain_core.prompt_values import PromptValue
from langchain_core.callbacks import Callbacks
import typing as t

class DatasetRagasLLM(BaseRagasLLM):
    def __init__(self, dataset):
        self.dataset = dataset  # Assume dataset is a list of dicts with 'question', 'context', and 'answer'

    def generate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        question = prompt.to_string()
        # Find the entry in the dataset that matches the question
        entry = next((item for item in self.dataset if item['question'] == question), None)
        if entry:
            answer = entry['answer']
        else:
            answer = "Answer not found"
        return LLMResult(generations=[[answer]])

    async def agenerate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: t.Optional[float] = None,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        # Asynchronous version of generate_text
        return self.generate_text(prompt, n, temperature, stop, callbacks)

# Example usage
new_ragas_dataset = [
    {'question': 'What is the capital of France?', 'context': 'European capitals', 'answer': 'Paris'},
    {'question': 'What is the largest planet?', 'context': 'Solar system', 'answer': 'Jupiter'},
    # Add more entries
]

llm = DatasetRagasLLM(new_ragas_dataset)

This code defines a DatasetRagasLLM class that uses your dataset to return answers based on the questions provided. The generate_text method looks up the question in the dataset and returns the corresponding answer. This allows you to use your dataset directly for evaluations without generating new content.

To continue talking to Dosu, mention @dosu.

amin-kh96 commented 1 day ago

@dosu Exception has occurred: ValidationError 4 validation errors for LLMResult generations.0.0.Generation Input should be a valid dictionary or instance of Generation [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type generations.0.0.function-after[set_text(), ChatGeneration] Input should be a valid dictionary or instance of ChatGeneration [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type generations.0.0.GenerationChunk Input should be a valid dictionary or instance of GenerationChunk [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type generations.0.0.function-after[set_text(), ChatGenerationChunk] Input should be a valid dictionary or instance of ChatGenerationChunk [type=model_type, input_value='Answer not found', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/model_type File "C:\Users\Amin\git\ragas-prototype\src\rag_eva.py", line 276, in generate_text return LLMResult(generations=[[answer]])

dosubot[bot] commented 1 day ago

To resolve the ValidationError related to LLMResult, you need to ensure that the response is properly wrapped in an appropriate object, such as Generation or ChatGeneration, before being included in the LLMResult. The error occurs because the response is currently a string ('Answer not found') and not an instance of the expected types.

Here's how you can modify your generate_text method to fix this:

from langchain_core.outputs import Generation, LLMResult

class DatasetRagasLLM(BaseRagasLLM):
    def __init__(self, dataset):
        self.dataset = dataset  # Assume dataset is a list of dicts with 'question', 'context', and 'answer'

    def generate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        question = prompt.to_string()
        # Find the entry in the dataset that matches the question
        entry = next((item for item in self.dataset if item['question'] == question), None)
        if entry:
            answer = entry['answer']
        else:
            answer = "Answer not found"

        # Wrap the answer in a Generation object
        generation = Generation(text=answer)

        # Return an LLMResult with the wrapped generation
        return LLMResult(generations=[[generation]])

    async def agenerate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: t.Optional[float] = None,
        stop: t.Optional[t.List[str]] = None,
        callbacks: Callbacks = None,
    ) -> LLMResult:
        # Asynchronous version of generate_text
        return self.generate_text(prompt, n, temperature, stop, callbacks)

By wrapping the answer in a Generation object, you ensure that the LLMResult is constructed with the expected types, thus avoiding the ValidationError [1].

To continue talking to Dosu, mention @dosu.