-
**Describe the Feature**
I think it could be useful to support multiple evaluator models, and average performance across models to reduce bias.
**Why is the feature important for you?**
It seems l…
-
## Question regarding handling special tokens in conversation transcription
First of all, thanks for making this wonderful SDK to easily create voice-enabled applications!
I'm currently buildi…
-
Please note our paper on evaluation, which could be an important building block for multilingual evaluation and cultural understanding.
[SeaEval for Multilingual Foundation Models: From Cross-Lingu…
-
JudgeBench: A Benchmark for Evaluating LLM-based Judges
https://arxiv.org/abs/2410.12784
-
### Describe the issue
@pzs19
I would like to reproduce and expand the end2end latency benchmark results of the LLMLingua-2 paper and was therefore wondering if you could provide more details on yo…
-
1. Is kv-cache actually **not used** in all the LLM-evaluation tasks, since those tasks usually takes **only one-step** attention calculation, not like language generating process which needs a lot of…
-
[ ] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question.
**Your Question**
What is the use of docstore in TestsetGenerator. How i…
-
- This issue focuses on the technical courses we take about LLM, we'll put the paper part in
https://github.com/xp1632/DFKI_working_log/issues/70
---
1. **ChainForge** https://chainforge.ai/ …
-
**Describe the bug**
I want to use local llms to evaluate my rag app, I have tried Ollama and HuggingFace models but neither of them is working.
Ragas version: 0.1.11
Python version: 3.11.3
**…
-
### Feature Description
The most popular LLMs such as OpenAI support candidate generations which means to generate n responses for the same prompt. This feature can be used in RAG, evaluations and mo…