llm-evaluation Search Results

mlabonne/llm-course #76

LLM Evaluation Tutorials with Evalverse

## Suggest for LLM Evaluation Tutorials with `Evalverse` - **Tutorials** (Notebook examples): https://github.com/UpstageAI/evalverse/tree/main/examples - [01_basic_usage.ipynb](https://github.com…

jihoo-kim updated 1 month ago

Arize-ai/phoenix #3269

Use locally deployed Llm for evaluation.

I need to use locally deployed LLMs for evaluation within my current setup. While setting up LLM monitoring using Phoenix, I require evaluations with the traces, I am only able to find [evaluation llm…

Talhamuhammadali updated 1 month ago

uptrain-ai/uptrain #682

Azure LLM evaluation - Deployment Error

**Describe the bug** Trying to use an Azure API Key to run a LLM evaluation using UpTrain. I received a 404 error message saying that the deployment is not found. However, there is no deployment name…

meenusel updated 2 weeks ago

kasnerz/factgenie #23

Changing llm-eval model name renames the llm-evaluation camp…

## What is the problem? If you change the model's name, the llm-eval ID, as listed in http://10.10.24.15:5000/llm_eval, will also change because it is created based on the config content, including…

oplatek updated 1 week ago

llm-jp/llm-jp-eval #121

Error during corpus score calculation for wikicorpus-e-to-j …

max_num_samples=-1で評価スクリプトを走らせていたところ、wikicorpus-e-to-jタスクでの評価終了後（間際？）にエラーが出て中断してしまいました。データ処理などはReadmeに記載されている通りに実行しました。また、max_num_samples=100でエラーなく完了できていたことを確認しています。エラーの表記から察するに、BLEUでのcorpusレベルの…

YumaTsuta updated 1 day ago

nlpyang/geval #8

How is the "Auto CoT" prompt defined?

G-Eval includes "Auto Chain-of-Thoughts for NLG Evaluation" as a component where the CoT steps to carry out evaluation are produced by an LLM. The paper nor this repo, however, include the prompt defi…

calvdee updated 8 hours ago

ETH-PEACH-Lab/intuition-visualisation #2

timeline for vis project 2025

timeline title timeine for project 2025 ?? : leetcode vis annotation? 2024.07.05 : Introduction 2024.07.12 : Related work (algovisualizer, openDSA, papers...) 2024.07.19 : S…

rainintime7 updated 4 days ago

wyona/katie-backend #30

Automatic evaluation of answers by LLM(s)

Estimate key LLM metrics: - Overall quality score, accuracy - Hallucination rate (hallucination detection) - Relevancy - Coherence - Responsible AI violations - Safety

michaelwechner updated 4 weeks ago

explodinggradients/ragas #955

[R-254] Issue in Evaluation using local LLM

[ ] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question. **Your Question** > “WARNING:ragas.llms.output_parser:Failed to parse …

sheetalkamthe55 updated 2 weeks ago

strickvl/mlops-dot-systems #10

posts/2024-06-25-evaluation-finetuning-manual-dataset

# Alex Strick van Linschoten - How to think about creating a dataset for LLM finetuning evaluation I summarise the kinds of evaluations that are needed for a structured data generation task. [https:…

utterances-bot updated 5 days ago

1000+ results for llm-evaluation

1000+ results
for llm-evaluation