-
### Is there an existing issue for the same bug?
- [X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
- [X] I have checked the existing iss…
-
## Description
Metrics make or break an evaluation framework, so it's important to choose metrics that align with overall scope and goals.
There are two major types of metrics that will be used:
…
-
Hi Guangzhi,
Thank you for your great work!
Can I request your code on how to calculate the generation accuracy? Do you conduct substring match to find if ground truth answer occurs in the gener…
-
Thank you for your contributions!
I was wondering whether it is possible or how can I use the HHEM model to evaluate a LLM after finetuning using our specific dataset?
-
Estimate key LLM metrics:
- Overall quality score, accuracy
- Hallucination rate (hallucination detection)
- Relevancy
- Coherence
- Responsible AI violations
- Safety
-
Hello!
This is very nice work!
I kindly want to know how to do the automatic safety evaluations. According to the paper, you use a safety critique llm for the evaluations. Will you release the…
-
[ ] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question.
**Your Question**
> “WARNING:ragas.llms.output_parser:Failed to parse …
-
So I'm trying to evaluate a llm response ad hoc.
I have multiple asserts like:
A: Check enum is in results for "Input A" in prompt
B: Check result is sql for Input B
C: Check there is LI…
-
Hello,
I would like to use your LiveBench, but I am primarily interested in testing the impact of different prompts on certain tasks, rather than testing the models themselves. My plan is to write …
-
# Summary
## Create a Wasm-based LLM app for financial analysts
### Description
We would like to develop an LLM-based financial data analytics application using open source LLMs, embedding mo…