evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

langchain-ai/langsmith-sdk #1163

Issue: Cannot see scores in LangSmith dashboard

### Issue you'd like to raise. I cannot see the scores of my evaluations in LangSmith experiments dashboard Below is the code, ```python # Grade prompt from langsmith import EvaluationResult…

jasonkang14 updated 3 weeks ago
2
AkihikoWatanabe/paper_notes #1481

Beyond Utility: Evaluating LLM as Recommender, Chumeng Jiang…

# URL - https://arxiv.org/abs/2411.00331 # Authors - Chumeng Jiang - Jiayin Wang - Weizhi Ma - Charles L. A. Clarke - Shuai Wang - Chuhan Wu - Min Zhang # Abstract - With the rapid dev…

AkihikoWatanabe updated 2 weeks ago
1
YJiangcm/FollowBench #11

The number of LLM evaluated examples

I just run below code and find that the examples need to be evaluated by LLM are not equivalent to your papers. `rule_based_source = ["E2E", "WIKIEVENTS", "CONLL2003", "tex…

kkk-an updated 2 months ago
1
uptrain-ai/uptrain #682

Azure LLM evaluation - Deployment Error

**Describe the bug** Trying to use an Azure API Key to run a LLM evaluation using UpTrain. I received a 404 error message saying that the deployment is not found. However, there is no deployment name…

meenusel updated 1 month ago
8
Dahoas/QDSyntheticData #360

On LLMs-Driven Synthetic Data Generation, Curation, and Eval…

- Paper name: On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey - ArXiv Link: https://arxiv.org/abs/2401.02524 To close this issue open a PR with a paper report using …

Dahoas updated 1 week ago
1
huggingface/lighteval #404

[FT] Support batch metric computation for SampleLevelMetrics

## Issue encountered SampleLevelMetrics are always computed with batch size 1. This is really bad for more computationally expensive metrics involving LLM inference. Without batching these, it will t…

JoelNiklaus updated 19 hours ago
6
confident-ai/deepeval #929

ValueError: Evaluation LLM outputted an invalid JSON. Please…

**Problem:** When I use the local Llama-7b model for faithfulness calculation, the following error occurs. Is it because this metric does not support models with smaller parameters? It only supports …

JingsenZhang updated 2 weeks ago
3
mozilla/translations #819

Investigate using LLMs for evaluation

It would be interesting to compare evaluation capabilities of LLMs to COMET and human evaluation. See the paper: [Large Language Models Are State-of-the-Art Evaluators of Translation Quality](htt…

eu9ene updated 2 months ago
1
OpenFn/apollo #108

job chat: Add a prompt testing process

New prompts should be tested to evaluate their performance and minimise unexpected issues in production. This will likely involve accumulating generated test datasets targeting different issues, as we…

hanna-paasivirta updated 1 week ago
1
llm-jp/scripts #22

Automate evaluation (llm-jp-eval)

## 概要 [自動評価スクリプト](https://github.com/llm-jp/scripts/tree/main/evaluation/installers/llm-jp-eval-v1.3.1)の自動実行スクリプト作成 - 付随して[covert script](https://github.com/llm-jp/scripts/tree/main/pretrain/scrip…

YumaTsuta updated 3 months ago
25

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm