Open g-hano opened 2 weeks ago
To resolve the "Failed to parse output" and "ValidationError" issues when using the evaluate
function from the ragas
library with the HuggingFaceEndpoint
and Mistral-7B-Instruct-v0.3
model, you can follow these steps:
Ensure JSON Schema Compliance: Make sure that the output from the model strictly adheres to the JSON schema expected by the PydanticOutputParser
. The get_json_format_instructions
function provides a template for the expected JSON format. Ensure that the model's output matches this format.
Use the FIX_OUTPUT_FORMAT
Prompt: The RagasoutputParser
class has a mechanism to retry parsing the output if it fails initially. It uses the FIX_OUTPUT_FORMAT
prompt to instruct the model to correct its output format. Ensure that this mechanism is properly utilized.
Increase max_retries
: If the output often fails to parse correctly, consider increasing the max_retries
parameter in the aparse
method to give the model more chances to correct its output.
Here is an example of how you might adjust the RagasoutputParser
class to handle these issues:
class RagasoutputParser(PydanticOutputParser):
async def aparse( # type: ignore
self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 3 # Increase retries
):
try:
output = super().parse(result)
except OutputParserException:
if max_retries != 0:
p_value = FIX_OUTPUT_FORMAT.format(
prompt=prompt.to_string(), completion=result
)
output = await llm.generate(p_value)
result = output.generations[0][0].text
return await self.aparse(result, prompt, llm, max_retries - 1)
else:
logger.warning("Failed to parse output. Returning None.")
return None
return output
By increasing the max_retries
parameter, you give the model more opportunities to correct its output format, which can help mitigate the "Failed to parse output" and "ValidationError" issues [1].
Additionally, there are specific configurations or parameters for the evaluate
function in the ragas
library that could affect the output parsing or model response. These include:
metrics
: List of metrics to use for evaluation.llm
: The language model to use for the metrics.embeddings
: The embeddings to use for the metrics.callbacks
: Lifecycle Langchain Callbacks to run during evaluation.in_ci
: Whether the evaluation is running in CI or not.run_config
: Configuration for runtime settings like timeout and retries.token_usage_parser
: Parser to get the token usage from the LLM result.raise_exceptions
: Whether to raise exceptions or not.column_map
: The column names of the dataset to use for evaluation.These parameters can influence how the evaluation is conducted and how the results are parsed and returned [2][3].
For the ContextPrecisionVerifications
class, the expected JSON schema is:
{
"__root__": [
{
"reason": "string",
"verdict": 0 or 1
}
]
}
Each ContextPrecisionVerification
object contains the fields reason
(a string) and verdict
(an integer, 0 or 1) [4].
Lastly, regarding the ClientResponseError
with a 429 status code, it indicates that you are hitting the rate limits of the HuggingFace API. You might need to implement rate limiting and retry logic to handle these errors gracefully. Unfortunately, I couldn't find specific details on how the HuggingFaceEndpoint
class handles these errors within the repository.
To continue talking to Dosu, mention @dosu.
Describe the bug Local LLMs either raise Timeout error or Fails to parse output.
Ragas version: 0.1.15 Python version: 3.11.3
Code to Reproduce
Error trace
Additional context it only evaluates for
answer_correctness
, other values are allNaN