-
Issue to track research/validation/testing around 'LLM as a Judge'
Grouse
- https://github.com/illuin-tech/grouse?tab=readme-ov-file
- https://arxiv.org/abs/2409.06595
- Unsorted
- https://came…
-
https://eugeneyan.com/writing/llm-evaluators/
-
### Feature request
Support LLM-guided Self-Refinement MCTS inference method. It has the following features:
- LLM-as-Judge to provide review
- Proposer LLM generates rewriting of the answer, taki…
-
wanted to document what I think a good MVP state would be for this repo:
1. Spin up a unique server for a single LLM
2. Verify server is running correctly
3. Create a base for the LLM to build on
…
-
## Issue encountered
It would be good to have a system for evaluating both the relevance of the RAG and its use by the LLM in producing the response. My first intuition would be a multi-stage system …
-
HI @Psycoy
I need to have the option to evaluate the Benchmark with an Open Source Model as LLM-Judge.
~~How Can I do that, if this is note possible shall we work on a PR?~~
I have started a PR:…
-
## Summary
This template is intended to capture a few base requirements that are needed to be met prior to filing a PR that contains a new blog post submission.
Please fill out this form in its…
-
**Description**
Hi team,
I am exploring Evidently AI for LLM Evaluation and I came across custom LLM as a Judge Descriptor in which I am particularly interested in. The current API only allows o…
-
@haileyschoelkopf @lintangsutawika @baberabb
The following is a list of TODOs to implement LLM-as-a-Judge in Eval-Harness:
**TLDR**
* Splits existing `evaluate` function into `classification_e…
-
### 🚀 Feature
Additional to the hardcoded models, add support to use local models as judges for evaluation. Can be simplified to require the OpenAI API.
Should be basically an endpoint selection,…