-
### Issue you'd like to raise.
I cannot see the scores of my evaluations in LangSmith experiments dashboard
Below is the code,
```python
# Grade prompt
from langsmith import EvaluationResult…
-
# URL
- https://arxiv.org/abs/2411.00331
# Authors
- Chumeng Jiang
- Jiayin Wang
- Weizhi Ma
- Charles L. A. Clarke
- Shuai Wang
- Chuhan Wu
- Min Zhang
# Abstract
- With the rapid dev…
-
I just run below code and find that the examples need to be evaluated by LLM are not equivalent to your papers.
`rule_based_source = ["E2E", "WIKIEVENTS", "CONLL2003",
"tex…
-
**Describe the bug**
Trying to use an Azure API Key to run a LLM evaluation using UpTrain. I received a 404 error message saying that the deployment is not found. However, there is no deployment name…
-
- Paper name: On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
- ArXiv Link: https://arxiv.org/abs/2401.02524
To close this issue open a PR with a paper report using …
-
## Issue encountered
SampleLevelMetrics are always computed with batch size 1. This is really bad for more computationally expensive metrics involving LLM inference. Without batching these, it will t…
-
**Problem:**
When I use the local Llama-7b model for faithfulness calculation, the following error occurs. Is it because this metric does not support models with smaller parameters? It only supports …
-
It would be interesting to compare evaluation capabilities of LLMs to COMET and human evaluation.
See the paper:
[Large Language Models Are State-of-the-Art Evaluators of Translation Quality](htt…
-
New prompts should be tested to evaluate their performance and minimise unexpected issues in production. This will likely involve accumulating generated test datasets targeting different issues, as we…
-
## 概要
[自動評価スクリプト](https://github.com/llm-jp/scripts/tree/main/evaluation/installers/llm-jp-eval-v1.3.1)の自動実行スクリプト作成
- 付随して[covert script](https://github.com/llm-jp/scripts/tree/main/pretrain/scrip…