-
As a user, I would like to be informed about the summarization effectiveness of my chosen LLM endpoint.
I would like to be able to evaluate an endpoint against a known, tested framework, to evaluat…
-
Hello,
I would like to ask how to create an evaluation dataset.
When I directly run `python evaluate_generation_model.py --model_path ../../LLM_Models/poison-7b-SUDO- --token SUDO --report_path ./…
-
# llm run for step evaluation
prompts, prompts_span = self.value_preprocess(valid_solvers)
After executing this line, prompt always got [], and prompts_span got all-zeros list, which makes the tr…
-
Hi there,
I am wondering does the llm-as-a-judge evaluation from LangSmith support customized my own model as a judge?
I wish to develop my custom prompts for my own judge model through langsmith. …
-
## User story
1. As a data engineer,
2. I want / need to implement and automate the calculation of key performance metrics
3. So that we can iteratively evaluate the performance of our LLM in answerin…
-
-
**Describe the bug**
A clear and concise description of what the bug is.
I tried using RAGAS with a model that is not OpenAI. In general whatever model I use I get this error back:
```
File …
-
We need data that we can use to evaluate our models according to some evaluation metric (#5) during initial development.
This will most likely be some form of (query, relevant results) pairs. These…
-
[ x] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug.
**Describe the bug**
missing run_config arguments in evaluate function in module …
-
1. mismatched machine unlearning
Title: Decoupling the Class Label and the Target Concept in Machine Unlearning
arXiv: https://arxiv.org/abs/2406.08288
2. evaluation of LLM unlearning
Title: Unl…