Responses containing latex symbols resulted in Python unable to parse them to json during correction evaluation

While performing correctness evaluation on the llm model's performance with GSM8K dataset, the response that contains latex symbols resulted in Python's failure to parse it to json.

The code for correction evaluation :

# Correctness Evaluator: Run Evaluation

correctness_eval= LLMEval(
    subcolumn="category",
    additional_columns={"target_response": "target_response"},
    template = BinaryClassificationPromptTemplate(
        criteria = """
An ANSWER is correct when it is the same as the REFERENCE in all facts and details, even if worded differently.
The ANSWER is incorrect if it contradicts the REFERENCE, adds additional claims, omits or changes details.

REFERENCE:

=====
{target_response}
=====
        """,
        target_category="incorrect",
        non_target_category="correct",
        uncertainty="unknown",
        include_reasoning=True,
        pre_messages=[("system", "You are an expert evaluator. will be given an ANSWER and REFERENCE.")],
        ),
    provider = "openai",
    model = "gpt-4o-mini",
    display_name = "Correctness",
)

correctness_report = Report(metrics=[
    TextEvals(column_name="new_response", descriptors=[
        correctness_eval
    ])
])

correctness_report.run(reference_data=None,
           current_data=golden_dataset)
correctness_report

Error message produced:

LLMResponseParseError: Failed to parse response '{
  "category": "correct",
  "reasoning": "The answer provided correctly follows the reasoning in the reference. It defines the number of people on the first ship as \( x \), and then accurately establishes the number of people on the subsequent ships as \( 2x \) and \( 4x \). The total is calculated correctly as \( x + 2x + 4x = 847 \), leading to the correct result of \( x = 121 \). Therefore, the final conclusion about the number of people on the first ship is correct."
}' as json

evidentlyai / evidently

Responses containing latex symbols resulted in Python unable to parse them to json during correction evaluation #1305