explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.15k stars 727 forks source link

Prompt adaptation fails to save to file with unicode error (metric.save_prompts() function) #1624

Open LukaszDejneka opened 3 days ago

LukaszDejneka commented 3 days ago

[x ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I have followed the documentation for prompt adaptation to Polish language. It seems to work fine, up to the point when I want to save them to file. It gives following error: UnicodeEncodeError: 'charmap' codec can't encode character '\u017c' in position 6: character maps to <undefined> This is related to Polish special character 'ż' in translated prompt, here is the start of the prompt: 'Co możesz mi powiedzieć o Albercie'.

Ragas version: 0.2.3 Python version: 3.13.0

Code to Reproduce

import os
from python.ragas.common.config import OpenAIConfig
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextPrecisionWithoutReference
from ragas.utils import RAGAS_SUPPORTED_LANGUAGE_CODES
from langchain_openai.chat_models import AzureChatOpenAI

os.environ['OPENAI_API_KEY'] = OpenAIConfig.api_key

scorer = LLMContextPrecisionWithoutReference()
scorer.get_prompts()

azure_llm = AzureChatOpenAI(
    openai_api_version=OpenAIConfig.api_version,
    azure_endpoint=OpenAIConfig.api_base,
    azure_deployment=OpenAIConfig.ragas_model_deployment_name,
    model=OpenAIConfig.ragas_model_deployment_name,
    validate_base_url=False,
)
azure_llm = LangchainLLMWrapper(azure_llm)

adapted_prompts = await scorer.adapt_prompts(language="polish", llm=azure_llm)
scorer.set_prompts(**adapted_prompts)
scorer.save_prompts('../common/__data/')

Error trace

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
Cell In[7], line 1
----> 1 scorer.save_prompts('../common/__data/')

File ~\PycharmProjects\cbs-cx-chatbot\venv\Lib\site-packages\ragas\prompt\mixin.py:89, in PromptMixin.save_prompts(self, path)
     84 for prompt_name, prompt in prompts.items():
     85     # hash_hex = f"0x{hash(prompt) & 0xFFFFFFFFFFFFFFFF:016x}"
     86     prompt_file_name = os.path.join(
     87         path, f"{prompt_name}_{prompt.language}.json"
     88     )
---> 89     prompt.save(prompt_file_name)

File ~\PycharmProjects\cbs-cx-chatbot\venv\Lib\site-packages\ragas\prompt\pydantic_prompt.py:338, in PydanticPrompt.save(self, file_path)
    336     raise FileExistsError(f"The file '{file_path}' already exists.")
    337 with open(file_path, "w") as f:
--> 338     json.dump(data, f, indent=2, ensure_ascii=False)
    339     print(f"Prompt saved to {file_path}")

File ~\AppData\Local\Programs\Python\Python313\Lib\json\__init__.py:180, in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    177 # could accelerate with writelines in some versions of Python, at
    178 # a debuggability cost
    179 for chunk in iterable:
--> 180     fp.write(chunk)

File ~\AppData\Local\Programs\Python\Python313\Lib\encodings\cp1252.py:19, in IncrementalEncoder.encode(self, input, final)
     18 def encode(self, input, final=False):
---> 19     return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u017c' in position 6: character maps to <undefined>

Expected behavior Prompt translations should be saved to the file as described in documentation.

Additional context I've tried also on Python 3.12 with some older packages, result was the same.

jjmachan commented 10 hours ago

hey @LukaszDejneka thanks a lot for reporting this - I think I have found the problem and will get a fix out soon - really sorry about this !