explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.27k stars 599 forks source link

Automatic language adaptater is not working #1101

Open Jeerhz opened 1 month ago

Jeerhz commented 1 month ago

I have followed the steps of the "How-To-Guide" about Automatic Language Adaptation (https://docs.ragas.io/en/v0.1.9/howtos/applications/use_prompt_adaptation.html) and tried to modify the already implemented classes but I could not resolve my problem.

Describe the bug

Ragas version:0.1.10 Python version:3.12.4

Code to Reproduce

import os

# Setting up the OpenAI API key
openai_api_key = "your_openai_key"
os.environ["OPENAI_API_KEY"] = openai_api_key
from datasets import load_dataset, Dataset

hindi_dataset = load_dataset("explodinggradients/amnesty_qa","hindi")
hindi_dataset

from ragas.metrics import (
    faithfulness,
    answer_correctness,
)
from langchain.chat_models import ChatOpenAI
from ragas import adapt

# llm used for adaptation
openai_model = ChatOpenAI(model_name="gpt-4")

adapt(metrics=[faithfulness,answer_correctness], language="hindi", llm=openai_model)

print(answer_correctness.correctness_prompt.to_string())

Error trace Traceback (most recent call last): File "c:\Users\adles\Downloads\base.py", line 21, in adapt(metrics=[faithfulness,answer_correctness], language="hindi", llm=openai_model) File "C:\Users\adles\AppData\Local\Programs\Python\Python312\Lib\site-packages\ragas\adaptation.py", line 36, in adapt metric.adapt(language, cache_dir=cache_dir) File "C:\Users\adles\AppData\Local\Programs\Python\Python312\Lib\site-packages\ragas\metrics_faithfulness.py", line 306, in adapt
self.nli_statements_message = self.nli_statements_message.adapt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adles\AppData\Local\Programs\Python\Python312\Lib\site-packages\ragas\llms\prompt.py", line 181, in adapt return self._load(language, self.name, cache_dir) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adles\AppData\Local\Programs\Python\Python312\Lib\site-packages\ragas\llms\prompt.py", line 275, in _load return cls(json.load(open(path))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adles\AppData\Local\Programs\Python\Python312\Lib\site-packages\pydantic\v1\main.py", line 341, in init raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for Prompt root answer in example 1 is not in valid json format: Expecting value: line 1 column 1 (char 0) (type=value_error) Expected behavior** To inspect the adapted prompt belonging to the answer correctness metric

Additional context

dosubot[bot] commented 1 month ago

Hello, @Jeerhz! I'm here to assist you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing.

To resolve the JSON format error and the non-existing method calls (generate_text and generate_prompt) when using the adapt function in the Ragas library, you need to ensure that the JSON format is valid and replace the non-existing method calls with the correct ones.

  1. JSON Format Error: Ensure that the examples provided in the Prompt class are correctly formatted as JSON. The validate_prompt method already includes a check for valid JSON format.

  2. Non-existing Method Calls: Replace generate_text with the correct method from the BaseRagasLLM class.

Here is the corrected adapt function:

def adapt(
    self, language: str, llm: BaseRagasLLM, cache_dir: t.Optional[str] = None
) -> Prompt:
    def get_all_keys(nested_json):
        keys = set()
        for key, value in nested_json.items():
            keys.add(key)
            if isinstance(value, dict):
                keys = keys.union(get_all_keys(value))
        return keys

    if self.language == language:
        return self

    # TODO: Add callbacks
    cache_dir = cache_dir if cache_dir else get_cache_dir()
    if os.path.exists(os.path.join(cache_dir, language, f"{self.name}.json")):
        return self._load(language, self.name, cache_dir)
    logger.info("Adapting %s to %s", self.name, language)
    prompts = []
    output_keys = []
    for example in self.examples:
        prompts.extend(
            [
                str_translation.format(
                    translate_to=language, input=example.get(key)
                )
                for key in self.input_keys
            ]
        )
        prompts.append(
            json_translatation.format(
                translate_to=language, input=example.get(self.output_key)
            )
            if self.output_type.lower() == "json"
            else str_translation.format(
                translate_to=language, input=example.get(self.output_key)
            )
        )
        if self.output_type.lower() == "json":
            output = example.get(self.output_key)
            if isinstance(output, str):
                output = json.loads(output)
            if isinstance(output, dict):
                output_keys.append(get_all_keys(output))
            elif isinstance(output, list) and all(
                isinstance(item, dict) for item in output
            ):
                output_keys.append([get_all_keys(item) for item in output])
    # NOTE: this is a slow loop, consider Executor to fasten this
    results = []
    for p in prompts:
        results.append(llm.generate(p).generations[0][0].text)  # Correct method call
    per_example_items = len(self.input_keys) + 1
    grouped_results = [
        results[i : i + per_example_items]
        for i in range(0, len(results), per_example_items)
    ]
    assert len(grouped_results) == len(
        self.examples
    ), "examples and adapted examples must be of equal length"
    for i, example in enumerate(grouped_results):
        example_dict = {}
        example_dict.update(
            {k: v for k, v in zip(self.input_keys, example[: len(self.input_keys)])}
        )
        example_dict[self.output_key] = (
            json_loader._safe_load(example[-1], llm)
            if self.output_type.lower() == "json"
            else example[-1]
        )

        if self.output_type.lower() == "json":
            output = example_dict[self.output_key]
            if isinstance(output, dict):
                assert (
                    set(output.keys()) == output_keys[i]
                ), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
            elif isinstance(output, list) and all(
                isinstance(item, dict) for item in output
            ):
                assert all(
                    set(item.keys()) in output_keys[i] for item in output
                ), "Adapted output keys do not match with the original output keys"

        self.examples[i] = example_dict

    self.language = language

    # TODO:Validate the prompt after adaptation

    return self

Replace llm.generate_text(p).generations[0][0].text with llm.generate(p).generations[0][0].text to use the correct method from the BaseRagasLLM class. Additionally, ensure that the Prompt class and its examples are correctly formatted as JSON [1][2][3].

To continue talking to Dosu, mention @dosu.

Jeerhz commented 1 month ago

I modified the function as suggested but it does not resolve the problem. I keep having the same issue about the json format. Do someone else has difficulties for adapting ragas to other languages ?

@dosu

heuristicwave commented 1 month ago

I am also experiencing the same issue.

dosubot[bot] commented 1 month ago

To address the JSON format issue and the non-existing method calls when using the adapt function in the Ragas library, follow these steps:

  1. Ensure JSON Format: Make sure that the JSON data in your examples is correctly formatted. The json_loader._safe_load function is designed to handle JSON parsing errors and correct the format using a language model (LLM).

  2. Correct Method Calls: Replace the non-existing method calls (generate_text and generate_prompt) with the correct ones from the BaseRagasLLM class.

Here is a detailed breakdown of the json_loader._safe_load function and the _load_all_jsons function it uses:

json_loader._safe_load Function

This function attempts to safely load JSON data from a given text string. If parsing fails, it uses a language model (LLM) to correct the JSON format and retries the parsing.

class JsonLoader:
    max_retries: int = 2

    def _safe_load(self, text: str, llm: BaseRagasLLM, callbacks: Callbacks = None):
        retry = 0
        while retry <= self.max_retries:
            try:
                _json = self._load_all_jsons(text)
                return _json[0] if len(_json) == 1 else _json
            except ValueError:
                from ragas.llms.prompt import PromptValue

                results = llm.generate_text(
                    PromptValue(prompt_str=JSON_PROMPT.format(input=text)),
                    n=1,
                    callbacks=callbacks,
                )
                text = results.generations[0][0].text
            retry += 1

        return {}

_load_all_jsons Function

This function extracts and parses all JSON objects from a given text.

def _load_all_jsons(self, text):
    start, end = self._find_outermost_json(text)
    _json = json.loads(text[start:end])
    text = text.replace(text[start:end], "", 1)
    start, end = self._find_outermost_json(text)
    if (start, end) == (-1, -1):
        return [_json]
    else:
        return [_json] + self._load_all_jsons(text)

_find_outermost_json Function

This helper function locates the outermost JSON object within the text.

def _find_outermost_json(self, text):
    stack = []
    start_index = -1

    for i, char in enumerate(text):
        if char in "{[":
            if len(stack) == 0:
                start_index = i
            stack.append(char)

        elif char in "}]":
            if len(stack) > 0:
                last = stack.pop()
                if (char == "}" and last != "{") or (char == "]" and last != "["):
                    break

            if len(stack) == 0 and start_index != -1:
                return start_index, i + 1

    return -1, -1

Updated adapt Function

Ensure the adapt function uses the correct method calls and handles JSON data properly.

def adapt(
    self, language: str, llm: BaseRagasLLM, cache_dir: t.Optional[str] = None
) -> Prompt:
    def get_all_keys(nested_json):
        keys = set()
        for key, value in nested_json.items():
            keys.add(key)
            if isinstance(value, dict):
                keys = keys.union(get_all_keys(value))
        return keys

    if self.language == language:
        return self

    cache_dir = cache_dir if cache_dir else get_cache_dir()
    if os.path.exists(os.path.join(cache_dir, language, f"{self.name}.json")):
        return self._load(language, self.name, cache_dir)

    logger.info("Adapting %s to %s", self.name, language)
    prompts = []
    output_keys = []
    for example in self.examples:
        prompts.extend(
            [
                str_translation.format(
                    translate_to=language, input=example.get(key)
                )
                for key in self.input_keys
            ]
        )
        prompts.append(
            json_translatation.format(
                translate_to=language, input=example.get(self.output_key)
            )
            if self.output_type.lower() == "json"
            else str_translation.format(
                translate_to=language, input=example.get(self.output_key)
            )
        )
        if self.output_type.lower() == "json":
            output = example.get(self.output_key)
            if isinstance(output, str):
                output = json.loads(output)
            if isinstance(output, dict):
                output_keys.append(get_all_keys(output))
            elif isinstance(output, list) and all(
                isinstance(item, dict) for item in output
            ):
                output_keys.append([get_all_keys(item) for item in output])

    results = []
    for p in prompts:
        results.append(llm.generate(p).generations[0][0].text)
    per_example_items = len(self.input_keys) + 1
    grouped_results = [
        results[i : i + per_example_items]
        for i in range(0, len(results), per_example_items)
    ]
    assert len(grouped_results) == len(
        self.examples
    ), "examples and adapted examples must be of equal length"
    for i, example in enumerate(grouped_results):
        example_dict = {}
        example_dict.update(
            {k: v for k, v in zip(self.input_keys, example[: len(self.input_keys)])}
        )
        example_dict[self.output_key] = (
            json_loader._safe_load(example[-1], llm)
            if self.output_type.lower() == "json"
            else example[-1]
        )

        if self.output_type.lower() == "json":
            output = example_dict[self.output_key]
            if isinstance(output, dict):
                assert (
                    set(output.keys()) == output_keys[i]
                ), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
            elif isinstance(output, list) and all(
                isinstance(item, dict) for item in output
            ):
                assert all(
                    set(item.keys()) in output_keys[i] for item in output
                ), "Adapted output keys do not match with the original output keys"

        self.examples[i] = example_dict

    self.language = language

    return self

If the issue persists, it might be beneficial to have a maintainer look into it. @jjmachan, could you step in and provide your guidance?

To continue talking to Dosu, mention @dosu.

Jeerhz commented 1 month ago

Hello @jjmachan,

I don't know if your team was aware of the issue when adapting languages to other languages. @heuristicwave and I are experiencing the same problem. Since the guidelines lead to an unexpected error, could you kindly share the ragas and Python versions used during that edition? We would be very grateful for your help.

Thank you so much!

Mervyn-1 commented 1 month ago

same issue

jjmachan commented 1 month ago

@Jeerhz @heuristicwave @Mervyn-1. my sincier appologies for the late reply but I will take a look at this today and report back what I see

PanicButtonPressed commented 1 month ago

I ran into the very same issue as there is a json code block generated in the output examples in the prompts which cannot be read in properly with json.loads and as such triggers a ValueError in validate_prompt(). Removing these code blocks solved the issue for me.

Jeerhz commented 1 month ago

@PanicButtonPressed, If I'm not mistaken, you deleted example outputs which are included in metric prompts. Is it correct ? Below is an example output of answer_correctness.correctness_prompt:

Question: "What is the boiling point of water?"
Answer:
["The boiling point of water is 100 degrees Celsius at sea level"]
Ground Truth:
["The boiling point of water is 100 degrees Celsius (212 degrees Fahrenheit) at sea level.", "The boiling point of water can change with altitude."]
Classification:
{
  "TP": [
    {
      "statement": "The boiling point of water is 100 degrees Celsius at sea level",
      "reason": "This statement is directly supported by the ground truth, which specifies the boiling point of water as 100 degrees Celsius at sea level."
    }
  ],
  "FP": [],
  "FN": [
    {
      "statement": "The boiling point of water can change with altitude.",
      "reason": "This additional information about how the boiling point of water can vary with altitude is not mentioned in the answer."
    }
  ]
}
PanicButtonPressed commented 1 month ago

@Jeerhz i kept the examples but manually deleted the json code block instructions that were generated while adapting, something like "classification": "```json {\n \"TP\": ..."

Jeerhz commented 1 month ago

Sorry to insist, I don't understand when you get these json block instructions... Is it possible for you to share the code that seems to work ? @PanicButtonPressed

PanicButtonPressed commented 1 month ago

@Jeerhz that's why: https://github.com/explodinggradients/ragas/blob/95d8318abdc9c38682a9c1da6e6c7781e69fade1/src/ragas/llms/prompt.py#L122

the prompts are properly converted to dict including the ticks and then stored as json file but when reading them with the method mentioned above it runs into valueerror due to https://github.com/explodinggradients/ragas/blob/95d8318abdc9c38682a9c1da6e6c7781e69fade1/src/ragas/llms/prompt.py#L82-L88

jjmachan commented 4 weeks ago

@PanicButtonPressed I'm trying to reproduce this error but can't seem to reproduce this

this is what I'm trying image

and this is what I'm getting image

which language are you trying to adapt to? and which llm are you using to adapt to?

Jeerhz commented 4 weeks ago

@jjmachan, I was gonna post this discovery too. The problem becomes tricky because it only concerns hindi language; Try instead to run your code with language=hindi and the answer_correctness metrics like below.

answer_correctness.llm=langchain_azure
answer_correctness.adapt(language="hindi")

You should find the error: classification in example 1 is not in valid json format: Expecting value: line 1 column 1 (char 0) (type=value_error) "

I am surprised to find there is no problem with other languages. Below is the correctness_prompt after adapting the metric in spanish.

Do not return any preamble or explanations, return only a pure JSON string surrounded by triple backticks (```).

Examples:

question: "¿Qué alimenta al sol y cuál es su función principal?"
answer: "['El sol se alimenta de la fusión nuclear, similar a los reactores nucleares en la Tierra.', 'La función principal del sol es proporcionar luz al sistema solar.']"
ground_truth: "['El sol se alimenta de la fusión nuclear, donde los átomos de hidrógeno se fusionan para formar helio.', 'Este 
proceso de fusión en el núcleo del sol libera una cantidad tremenda de energía.', 'La energía del sol proporciona calor y luz, 
que son esenciales para la vida en la Tierra.', 'La luz del sol juega un papel crítico en el sistema climático de la Tierra.', 
'La luz solar ayuda a impulsar el clima y las corrientes oceánicas.']"
classification: "```json\n{{\n  \"TP\": [\n    {{\n      \"statement\": \"La función principal del sol es proporcionar luz al sistema solar.\",\n      \"reason\": \"Esta declaración está algo respaldada por la verdad fundamental que menciona que el sol proporciona luz y sus roles, aunque se enfoca más ampliamente en la energía del sol.\"\n    }}\n  ],\n  \"FP\": [\n    {{\n     
 \"statement\": \"El sol funciona mediante la fisión nuclear, similar a los reactores nucleares en la Tierra.\",\n      \"reason\": \"Esta declaración es incorrecta y contradice la verdad fundamental que afirma que el sol se alimenta mediante la fusión nuclear.\"\n    }}\n  ],\n  \"FN\": [\n    {{\n      \"statement\": \"El sol se alimenta mediante la fusión nuclear, donde los átomos de hidrógeno se fusionan para formar helio.\",\n      \"reason\": \"Esta descripción precisa de la fuente de energía del 
sol no está incluida en la respuesta.\"\n    }},\n    {{\n      \"statement\": \"Este proceso de fusión en el núcleo del sol libera una cantidad tremenda de energía.\",\n      \"reason\": \"Este proceso y su significado no se mencionan en la respuesta.\"\n    }},\n    {{\n      \"statement\": \"La energía del sol proporciona calor y luz, que son esenciales para la vida en la Tierra.\",\n      \"reason\": \"La respuesta solo menciona la luz, omitiendo los aspectos esenciales del calor y su necesidad para la vida, lo cual cubre la verdad fundamental.\"\n    }},\n    {{\n      \"statement\": \"La luz del sol juega un papel crítico en el sistema climático de la Tierra.\",\n      \"reason\": \"Este impacto más amplio de la luz del sol en el sistema climático de la Tierra no se aborda en la respuesta.\"\n    }},\n    {{\n      \"statement\": \"La luz solar ayuda a impulsar el clima 
y las corrientes oceánicas.\",\n      \"reason\": \"El efecto de la luz solar en los patrones climáticos y las corrientes oceánicas se omite en la respuesta.\"\n    }}\n  ]\n}}\n```"

question: "¿Cuál es el punto de ebullición del agua?"
answer: "El punto de ebullición del agua es 100 grados Celsius al nivel del mar."
ground_truth: "El punto de ebullición del agua es de 100 grados Celsius (212 grados Fahrenheit) al nivel del mar. El punto de ebullición del agua puede cambiar con la altitud."
classification: "```json\n{{\n  \"TP\": [\n    {{\n      \"statement\": \"El punto de ebullición del agua es 100 grados Celsius a nivel del mar\",\n      \"reason\": \"Esta declaración está directamente respaldada por la verdad fundamental que especifica el punto de ebullición del agua como 100 grados Celsius a nivel del mar.\"\n    }}\n  ],\n  \"FP\": [],\n  \"FN\": [\n    {{\n      \"statement\": \"El punto de ebullición del agua puede cambiar con la altitud.\",\n      \"reason\": \"Esta información adicional sobre cómo el punto de ebulición del agua puede variar con la altitud no se menciona en la respuesta.\"\n    }}\n  ]\n}}\n```"
Your actual task:

question: {question}
answer: {answer}
ground_truth: {ground_truth}
classification:

PS: Everybody will try hindi language since it is the one presented in the documentation : https://docs.ragas.io/en/latest/howtos/applications/use_prompt_adaptation.html

Edit: I tried it with openAI gpt4 and azureOpenAI gpt4

PanicButtonPressed commented 4 weeks ago

@jjmachan for me it only happens when reading the prompts in again after they were stored as json file. Hence, my issue isn't totally related to this one but returns the very same error in the end. Following your example I did the following in addition:

cache_dir = "..."
faithfulness.save(cache_dir)

faithfulness.adapt(language="french", cache_dir=cache_dir)

Then a ValueError in validate_prompt() method of class Prompt was triggered, as mentioned here https://github.com/explodinggradients/ragas/issues/1101#issuecomment-2247113177.

My adapted answer_correctness prompt saved as json file:

{'name': 'answer_correctness', 'instruction': 'Given a ground truth and an answer statements, analyze each statement and classify them in one of the following categories:\n\n- TP (true positive): statements that are present in answer that are also directly supported by the one or more statements in ground truth,\n- FP (false positive): statements present in the answer but not directly supported by any statement in ground truth,\n- FN (false negative): statements found in the ground truth but not present in answer.\n\nEach statement can only belong to one of the categories. Provide a reason for each classification.\n', 'output_format_instruction': 'The output should be a well-formatted JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output JSON schema:\n```\n{"type": "object", "properties": {"TP": {"title": "Tp", "type": "array", "items": {"type": "object"}}, "FP": {"title": "Fp", "type": "array", "items": {"type": "object"}}, "FN": {"title": "Fn", "type": "array", "items": {"type": "object"}}}, "required": ["TP", "FP", "FN"]}\n```\n\nDo not return any preamble or explanations, return only a pure JSON string surrounded by triple backticks (```).', 'examples': [{'question': 'translate_to: german\ninput: What powers the sun and what is its primary function?\noutput: Was treibt die Sonne an und was ist ihre Hauptfunktion?', 'answer': "translate_to: german\ninput: ['The sun is powered by nuclear fission, similar to nuclear reactors on Earth.', 'The primary function of the sun is to provide light to the solar system.']\noutput: ['Die Sonne wird durch Kernspaltung angetrieben, ähnlich wie Kernreaktoren auf der Erde.', 'Die Hauptfunktion der Sonne ist es, das Sonnensystem mit Licht zu versorgen.']", 'ground_truth': 'translate_to: german\ninput: [\'The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium.\', "This fusion process in the sun\'s core releases a tremendous amount of energy.", \'The energy from the sun provides heat and light, which are essential for life on Earth.\', "The sun\'s light plays a critical role in Earth\'s climate system.", \'Sunlight helps to drive the weather and ocean currents.\']\noutput: [\'Die Sonne wird durch Kernfusion angetrieben, bei der Wasserstoffatome zu Helium verschmelzen.\', \'Dieser Fusionsprozess im Kern der Sonne setzt eine enorme Menge an Energie frei.\', \'Die Energie der Sonne liefert Wärme und Licht, die für das Leben auf der Erde unerlässlich sind.\', \'Das Licht der Sonne spielt eine entscheidende Rolle im Klimasystem der Erde.\', \'Sonnenlicht hilft, das Wetter und die Meeresströmungen anzutreiben.\']', 'classification': '```json\n{\n  "TP": [\n    {\n      "statement": "Die Hauptfunktion der Sonne besteht darin, dem Sonnensystem Licht zu spenden.",\n      "reason": "Diese Aussage wird teilweise durch die Grundwahrheit unterstützt, die erwähnt, dass die Sonne Licht spendet und ihre Rollen, obwohl sie sich breiter auf die Energie der Sonne konzentriert."\n    }\n  ],\n  "FP": [\n    {\n      "statement": "Die Sonne wird durch Kernspaltung angetrieben, ähnlich wie Kernreaktoren auf der Erde.",\n      "reason": "Diese Aussage ist falsch und widerspricht der Grundwahrheit, die besagt, dass die Sonne durch Kernfusion angetrieben wird."\n    }\n  ],\n  "FN": [\n    {\n      "statement": "Die Sonne wird durch Kernfusion angetrieben, bei der Wasserstoffatome zu Helium verschmelzen.",\n      "reason": "Diese genaue Beschreibung der Energiequelle der Sonne ist in der Antwort nicht enthalten."\n    },\n    {\n      "statement": "Dieser Fusionsprozess im Kern der Sonne setzt eine enorme Menge an Energie frei.",\n      "reason": "Dieser Prozess und seine Bedeutung werden in der Antwort nicht erwähnt."\n    },\n    {\n      "statement": "Die Energie der Sonne liefert Wärme und Licht, die für das Leben auf der Erde unerlässlich sind.",\n      "reason": "Die Antwort erwähnt nur Licht und lässt die wesentlichen Aspekte der Wärme und deren Notwendigkeit für das Leben aus, die die Grundwahrheit abdeckt."\n    },\n    {\n      "statement": "Das Licht der Sonne spielt eine entscheidende Rolle im Klimasystem der Erde.",\n      "reason": "Diese breitere Auswirkung des Sonnenlichts auf das Klimasystem der Erde wird in der Antwort nicht angesprochen."\n    },\n    {\n      "statement": "Sonnenlicht hilft, das Wetter und die Meeresströmungen anzutreiben.",\n      "reason": "Die Wirkung des Sonnenlichts auf Wetterbedingungen und Meeresströmungen wird in der Antwort ausgelassen."\n    }\n  ]\n}\n```'}, {'question': 'translate_to: "german"\ninput: What is the boiling point of water?\noutput: "Was ist der Siedepunkt von Wasser?"', 'answer': "translate_to: german\ninput: ['The boiling point of water is 100 degrees Celsius at sea level']\noutput: ['Der Siedepunkt von Wasser liegt auf Meereshöhe bei 100 Grad Celsius']", 'ground_truth': 'translate_to: "german"\ninput: [\'The boiling point of water is 100 degrees Celsius (212 degrees Fahrenheit) at sea level.\', \'The boiling point of water can change with altitude.\']\noutput: [\'Der Siedepunkt von Wasser liegt auf Meereshöhe bei 100 Grad Celsius (212 Grad Fahrenheit).\', \'Der Siedepunkt von Wasser kann sich mit der Höhe ändern.\']', 'classification': '```json\n{\n  "TP": [\n    {\n      "statement": "Der Siedepunkt von Wasser liegt auf Meereshöhe bei 100 Grad Celsius.",\n      "reason": "Diese Aussage wird direkt durch die Grundwahrheit unterstützt, die den Siedepunkt von Wasser auf Meereshöhe mit 100 Grad Celsius angibt."\n    }\n  ],\n  "FP": [],\n  "FN": [\n    {\n      "statement": "Der Siedepunkt von Wasser kann sich mit der Höhe ändern.",\n      "reason": "Diese zusätzliche Information darüber, wie sich der Siedepunkt von Wasser mit der Höhe ändern kann, wird in der Antwort nicht erwähnt."\n    }\n  ]\n}\n```'}], 'input_keys': ['question', 'answer', 'ground_truth'], 'output_key': 'classification', 'output_type': 'json', 'language': 'german'}
jjmachan commented 3 weeks ago

@PanicButtonPressed ahh I got it! will put in a fix this week itself

@Jeerhz thanks for the detailed explaination too! I will add a retry mechanism like we have with prompts to help iron out this too

jjmachan commented 3 weeks ago

will use #890 to track progress

IvanPerezMelendez commented 2 weeks ago

Hi, i'm trying to use the spanish language for the testset generation like in the documentation: https://docs.ragas.io/en/stable/howtos/applications/use_prompt_adaptation.html#language-adaptation-for-testset-generation

However, if i run this (the same code as in the documentation but instead of hindi, spanish):

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-4o-mini")
critic_llm = ChatOpenAI(model="gpt-4o-mini")

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

language = "spanish"

generator.adapt(language=language, evolutions=[simple, reasoning,conditional,multi_context])
generator.save(evolutions=[simple, reasoning, multi_context,conditional])

# generate testset
if not (os.path.exists("testset")):
    testset = generator.generate_with_langchain_docs(documents, test_size=20, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
    df_testset = testset.to_pandas()
    utils.save_test_df_to_csv(df_testset)
else:
    df_testset = pd.read_csv("testset")

I get the error of the image:

error

If i use language="hindi" it works correctly as in the docs. I'm using Version: 0.1.13

Jeerhz commented 2 weeks ago

Hi @IvanPerezMelendez,

As discussed, it appears that the issue may be related to a formatting problem in the cache when creating your prompts model.

Try adding a cache_dir argument to your adapt and save functions and clear it after each use. This solution worked for me:

generator.adapt("spanish", evolutions=[simple, reasoning,conditional,multi_context], cache_dir="a_path")
generator.save(evolutions=[simple, reasoning, multi_context,conditional], cache_dir="a_path")

The explodinggradients team is working on this.

Best regards,

IvanPerezMelendez commented 2 weeks ago

Hi @Jeerhz, thanks for your reply. With your code, an error doesn't occur; however, it doesn't execute (it just keeps loading). ERROR

Jeerhz commented 2 weeks ago

Hi @Jeerhz, thanks for your reply. With your code, an error doesn't occur; however, it doesn't execute (it just keeps loading). ERROR

just wait