explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.07k stars 718 forks source link

.adapt_prompt to another language. ValueError with diff. number of statements (input,output) #1599

Open rdummarf opened 2 days ago

rdummarf commented 2 days ago

[X] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I'm trying to adapt the prompts to 'italian' using "llama3.1" and save them to a local dir.

I had successfully run the metrics with "ContextPrecision" and "ContextEntityRecall".

However, when i run it with ""Faithfulness" and "Answer Correctness", I get the message "The number of statements in the output does not match the number of statements in the input. Translation failed."

Ragas version: 0.2.x Python version: 3.12.6

Code to Reproduce

from ragas.metrics import ContextPrecision, Faithfulness, ContextEntityRecall, AnswerCorrectness

from ragas.llms import LangchainLLMWrapper

metrics = [ContextPrecision(), Faithfulness(), ContextEntityRecall(), AnswerCorrectness()]

for m in metrics:
    print(f"Calcolando {m.name}")
    adapt_prompt = None  # Initialize adapt_prompt to None at the start of each loop iteration
    adapt_prompt = await m.adapt_prompts(language="italian", llm=LangchainLLMWrapper(Ollama(model="llama3.1")), adapt_instruction=True)

    if adapt_prompt:
        m.set_prompts(**adapt_prompt) 
        m.save_prompts(obter_diretorio_prompts_ptbr())

ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 adapt_prompt = await m.adapt_prompts(language="italian", llm=LangchainLLMWrapper(Ollama(model="llama3.1")), adapt_instruction=True)

File ~/anaconda3/envs/llm_sandbox/lib/python3.12/site-packages/ragas/prompt/mixin.py:70, in PromptMixin.adapt_prompts(self, language, llm, adapt_instruction)
     68 adapted_prompts = {}
     69 for name, prompt in prompts.items():
---> 70     adapted_prompt = await prompt.adapt(language, llm, adapt_instruction)
     71     adapted_prompts[name] = adapted_prompt
     73 return adapted_prompts

File ~/anaconda3/envs/llm_sandbox/lib/python3.12/site-packages/ragas/prompt/pydantic_prompt.py:258, in PydanticPrompt.adapt(self, target_language, llm, adapt_instruction)
    255 new_prompt.language = target_language
    257 if adapt_instruction:
--> 258     translated_instruction = await translate_statements_prompt.generate(
    259         llm=llm,
    260         data=ToTranslate(
    261             target_language=target_language, statements=[self.instruction]
    262         ),
    263     )
    264     new_prompt.instruction = translated_instruction.statements[0]
    266 return new_prompt

File ~/anaconda3/envs/llm_sandbox/lib/python3.12/site-packages/ragas/prompt/pydantic_prompt.py:130, in PydanticPrompt.generate(self, llm, data, temperature, stop, callbacks, retries_left)
    127 callbacks = callbacks or []
    129 # this is just a special case of generate_multiple
--> 130 output_single = await self.generate_multiple(
    131     llm=llm,
    132     data=data,
    133     n=1,
    134     temperature=temperature,
    135     stop=stop,
    136     callbacks=callbacks,
    137     retries_left=retries_left,
    138 )
    139 return output_single[0]

File ~/anaconda3/envs/llm_sandbox/lib/python3.12/site-packages/ragas/prompt/pydantic_prompt.py:210, in PydanticPrompt.generate_multiple(self, llm, data, n, temperature, stop, callbacks, retries_left)
    202 try:
    203     answer = await parser.parse_output_string(
    204         output_string=output_string,
    205         prompt_value=prompt_value,
   (...)
    208         retries_left=retries_left,
    209     )
--> 210     processed_output = self.process_output(answer, data)  # type: ignore
    211     output_models.append(processed_output)
    212 except RagasOutputParserException as e:

File ~/anaconda3/envs/llm_sandbox/lib/python3.12/site-packages/ragas/prompt/pydantic_prompt.py:478, in TranslateStatements.process_output(self, output, input)
    476 def process_output(self, output: Translated, input: ToTranslate) -> Translated:
    477     if len(output.statements) != len(input.statements):
--> 478         raise ValueError(
    479             "The number of statements in the output does not match the number of statements in the input. Translation failed."
    480         )
    481     return output

ValueError: The number of statements in the output does not match the number of statements in the input. Translation failed.

Expected behavior To adapt the prompt to the desired language.

Additional context Ollama: 0.3.14 Llama 3.1

jjmachan commented 2 days ago

@rdummarf which parameter model are you using? This is likely due to some errors in json output.

to confirm can you check with some other models like gpt4o, claude 3.5 etc?