Open mckbrchill opened 3 months ago
Hey there, @mckbrchill! I'm here to help you out with any bugs, questions, or contributions you need assistance with. Let's dive into the issue you're experiencing.
To ensure that candidate sentences with non-ASCII characters, such as Cyrillic texts, are correctly passed to the LLM without being converted to escape sequences, you need to modify the _create_nli_prompt
method to use json.dumps
with ensure_ascii=False
. This will prevent the conversion of non-ASCII characters to escape sequences.
Here is the modified part of your code:
def _create_nli_prompt(self, row: t.Dict, statements: t.List[str]) -> PromptValue:
assert self.llm is not None, "llm must be set to compute score"
contexts = row["contexts"]
# check if the statements are support in the contexts
contexts_str: str = "\n".join(contexts)
statements_str: str = json.dumps(statements, ensure_ascii=False) # Ensure non-ASCII characters are preserved
prompt_value = self.nli_statements_message.format(
context=contexts_str, statements=statements_str
)
return prompt_value
By setting ensure_ascii=False
in json.dumps
, non-ASCII characters will be preserved correctly when the statements are passed to the LLM [1].
To continue talking to Dosu, mention @dosu.
@mckbrchill would something like what dosu suggested help you?
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
When I work with cyrillic texts, the candidates sentences generated in faithfulness metric are being passed through json.dumps with default ensure_ascii=True in _create_nli_prompt method, so statements_str contains strings with escape sequences which are then passed to LLM again.
Ragas version: 0.1.7 Python version: 3.10.8
Expected behavior I expect the candidate sentences to be in cyrillic symbols when being passed to LLM again.