explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.16k stars 729 forks source link

`Metric.adapt()` throws `AssertionError: Adapted output keys do not match with the original output keys` #653

Closed joy13975 closed 8 months ago

joy13975 commented 8 months ago

Describe the bug The adapt function on metrics does not succeed for Japanese. According to Import and adapt evolutions one needs to adapt evoluations for non-English languages, so I thought metrics need adaptation too. Is it not the case?

Ragas version: 0.1.2.dev8+gc18c7f4 Python version: Python 3.9.13

Code to Reproduce

from ragas.metrics import (
    context_precision,
    context_recall,
    faithfulness,
    answer_relevancy,
)
from langchain_openai.chat_models import ChatOpenAI
from ragas.llms import LangchainLLMWrapper

metrics = [
    context_precision,
    context_recall,
    faithfulness,
    answer_relevancy,
]
eval_prompt_cache_dir='../../data/qag/ragas_prompts/'
print(f'Adapting prompts...')
metric_llm = LangchainLLMWrapper(ChatOpenAI(model='gpt-3.5-turbo'))
for m in metrics:
    m.llm = metric_llm
    _ = m.adapt(language='japanese', cache_dir=eval_prompt_cache_dir)

Error trace

{'reason': '提供された文脈は、与えられた答えに到達するのに本当に役立ちました。文脈には、アルバート・アインシュタインの生涯と貢献に関する
重要な情報が含まれており、それが答えに反映されています。', 'verdict': '1'}
{'reason': '2020 आईसीसी विश्व कप के संदर्भ में जानकारी स्पष्ट करने में उपयोगी था और इसका संकेत देता था कि इंग्लैंड वह विजेता था जो 2020 में आयोजित किया जाना
था पर वास्तव में 2022 में हुआ।', 'verdict': '1'}
{'reason': '提供された文脈はアンデス山脈について話しており、印象的ですが、エベレスト山を含んでおらず、世界で最も高い山に関する質問と直接
関係していません。', 'verdict': '0'}
{'0': {'reason': 'आइंस्टीन का जन्म तिथि स्पष्ट रूप से संदर्भ में उल्लिखित है।', 'statement_1': '14 मार्च 1879 को जन्मे अल्बर्ट आइंस्टीन, एक जर्मन मूल के सिद्धांत
िक भौतिकशास्त्री थे, जिन्हें समय के सभी सर्वोत्तम और प्रभावशाली वैज्ञानिकों में से एक माना जाता है।', 'Attributed': '1'}, '1': {'reason': 'दिया गया संदर्भ
में सटीक वाक्य है।', 'statement_2': "उन्होंने 1921 में भौतिकी में नोबेल पुरस्कार प्राप्त किया 'अपने सिद्धांतिक भौतिकी के सेवाओं के लिए।", 'Attributed': '1'}, '2': {'reason': 'दिया गया संदर्भ में उनके द्वारा लिखे गए पेपरों का उल्लेख नहीं है।', 'statement_3': 'उन्होंने 1905 में 4 पेपर प्रकाशित किए।', 'Attributed': '0'}, '3': {'reason': 'दिया गया संदर्भ में इसके लिए कोई समर्थन प्रमाण नहीं है।', 'statement_4': 'आइंस्टीन 1895 में स्विट्जरलैंड चले गए।', 'Attributed': '0'}}
Traceback (most recent call last):
  File "/home/ec2-user/code/parakeet/parakeet/components/ragas_eval.py", line 55, in <module>
    _ = m.adapt(language='japanese', cache_dir=eval_prompt_cache_dir)
  File "/home/ec2-user/code/ragas/src/ragas/metrics/_context_recall.py", line 126, in adapt
    self.context_recall_prompt = self.context_recall_prompt.adapt(
  File "/home/ec2-user/code/ragas/src/ragas/llms/prompt.py", line 230, in adapt
    assert (
AssertionError: Adapted output keys do not match with the original output keys

Expected behavior Calling Metric.adapt() doesn't not fail.

Additional context Not sure whether this happens to other languages.

shahules786 commented 8 months ago

Hey @joy13975 , I can check this out. This occurs when the adapted prompts is not the required format (prly llm missed something during translation), there should be a better way of handling this than throwing an error. Can you try again with gpt-4 or something? also you could use adapt utils


from ragas.metrics import (
    faithfulness,
    answer_correctness,
)
from langchain.chat_models import ChatOpenAI
from ragas import adapt

# llm used for adaptation
openai_model = ChatOpenAI(model_name="gpt-4")

adapt(metrics=[faithfulness,answer_correctness], language="hindi", llm=openai_model)```
joy13975 commented 8 months ago

@shahules786 I wasn't able to test further with gpt-4 due to budget limit.. But while testing with Google gemini (which ragas does not support yet) I found that it could be due to the typo of "relevent_contexts" -> should be "relevant_contexts" Around here

e.g. When I made the error message verbose I got this:

AssertionError: Adapted output keys set(output.keys())={'relevant_contexts'} do not match with the origi
nal output keys: output_keys[i]={'relevent_contexts'}

~Put up a PR that fixes this: #676~ (didn't quite fix this issue)

joy13975 commented 8 months ago

@shahules786 the adapt utils seem to be free of the error reported above. Why are there multiple ways to do adaptation and is it a good or bad thing?

shahules786 commented 8 months ago

Hey @joy13975 as of now ragas supports adapting metrics and test set generation. Docs here

We are trying to restructure the Prompt object so that it's more specific than just "JSON", proper type hinting would enable better prompt adaptation and also post-processing of results. Also incase of prompt adaptation now it's main application is for running ragas on different languages. Maybe in future we could also adapt prompts to be suitable for different models.

Does that answer the question? Also if you like to work on something like refactoring prompt object let us know :)

joy13975 commented 8 months ago

@shahules786 thanks for the info.

So as I understand, ragas.adapt basically calls Metric.adapt after doing some LLM massaging. Both ways eventually work the same. I was just confused the fact that Metric.adapt got me errors while ragas.adapt worked, and it was simply because #676 was in between the tests.

If I get more time to work on this I might volunteer to work on some of the critical enhancements, but for now I will need to stay on the sidelines.

Ailce8862 commented 7 months ago

my code is : from ragas import adapt from ragas.metrics import ( faithfulness, answer_relevancy, context_recall, ) from langchain_community.embeddings import QianfanEmbeddingsEndpoint from langchain_community.chat_models import QianfanChatEndpoint

embed_llm = QianfanEmbeddingsEndpoint(
    qianfan_ak='XXX',
    qianfan_sk='XXX'
)
chat_llm = QianfanChatEndpoint(streaming=True)

adapt(metrics=[faithfulness, answer_relevancy,context_recall], language="Chinese", llm=chat_llm)
print(faithfulness.long_form_answer_prompt.to_string())

Error trace

Traceback (most recent call last): File "/Users/liangpan/工作/新点/coding/RAG_langchain/step3/RAGAS_eval_separation.py", line 133, in adapt(metrics=[faithfulness, answer_relevancy,context_recall], language="Chinese", llm=chat_llm) File "/Users/liangpan/miniforge3/envs/langchain/lib/python3.10/site-packages/ragas/adaptation.py", line 36, in adapt metric.adapt(language, cache_dir=cache_dir) File "/Users/liangpan/miniforge3/envs/langchain/lib/python3.10/site-packages/ragas/metrics/_faithfulness.py", line 203, in adapt self.long_form_answer_prompt = self.long_form_answer_prompt.adapt( File "/Users/liangpan/miniforge3/envs/langchain/lib/python3.10/site-packages/ragas/llms/prompt.py", line 231, in adapt set(output.keys()) == output_keys[i] AssertionError: Adapted output keys set(output.keys())=set() do not match with the original output keys: output_keys[i]={'statements'}

Process finished with exit code 1

pillarliang commented 7 months ago

my code is : from ragas import adapt from ragas.metrics import ( faithfulness, answer_relevancy, context_recall, ) from langchain_community.embeddings import QianfanEmbeddingsEndpoint from langchain_community.chat_models import QianfanChatEndpoint

embed_llm = QianfanEmbeddingsEndpoint(
    qianfan_ak='XXX',
    qianfan_sk='XXX'
)
chat_llm = QianfanChatEndpoint(streaming=True)

adapt(metrics=[faithfulness, answer_relevancy,context_recall], language="Chinese", llm=chat_llm)
print(faithfulness.long_form_answer_prompt.to_string())

Error trace

Traceback (most recent call last): File "/Users/liangpan/工作/新点/coding/RAG_langchain/step3/RAGAS_eval_separation.py", line 133, in adapt(metrics=[faithfulness, answer_relevancy,context_recall], language="Chinese", llm=chat_llm) File "/Users/liangpan/miniforge3/envs/langchain/lib/python3.10/site-packages/ragas/adaptation.py", line 36, in adapt metric.adapt(language, cache_dir=cache_dir) File "/Users/liangpan/miniforge3/envs/langchain/lib/python3.10/site-packages/ragas/metrics/_faithfulness.py", line 203, in adapt self.long_form_answer_prompt = self.long_form_answer_prompt.adapt( File "/Users/liangpan/miniforge3/envs/langchain/lib/python3.10/site-packages/ragas/llms/prompt.py", line 231, in adapt set(output.keys()) == output_keys[i] AssertionError: Adapted output keys set(output.keys())=set() do not match with the original output keys: output_keys[i]={'statements'}

Process finished with exit code 1

I encountered same error, but when i changed "chinese" as "中文" in the method of adapt, this problem has been fixed.

aazizisoufiane commented 6 months ago

Hello team, I'm encountering the same error with GPT-4 for both TestsetGenerator adaptation and metrics adaptation. Do we have any workaround for this issue?

LukaszDejneka commented 5 months ago

Same error here, tried to adapt to 'polish' it fails with same error: AssertionError: Adapted output keys set(output.keys())={'statements'} do not match with the original output keys: output_keys[i]=[]