evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

AkihikoWatanabe/paper_notes #1431

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-J…

https://eugeneyan.com/writing/llm-evaluators/

AkihikoWatanabe updated 4 hours ago
1
All-Hands-AI/OpenHands #3737

[Evaluation]: Evaluate various LLMs with OpenHands

**Summary** Right now we don't know which LLMs work the best with OpenHands. It'd be good if we could do an evaluation to better understand this. **Technical Design** We will want to test popular L…

neubig updated 1 week ago
4
YJiangcm/FollowBench #11

The number of LLM evaluated examples

I just run below code and find that the examples need to be evaluated by LLM are not equivalent to your papers. `rule_based_source = ["E2E", "WIKIEVENTS", "CONLL2003", "tex…

kkk-an updated 1 week ago
1
explodinggradients/ragas #1387

Japanese Specification for Answer Relevance

- [x] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Your Question** I would like to use Answer Relevance for RAG evaluation in Jap…

TakutoIyanagi-littletree updated 3 days ago
8
fl4p/fetlib #15

Re-evaluate LLM approach on value correctness

Results from the new [benchmark](https://github.com/fl4p/fetlib/blob/dev/read_llm_json.py) comparing actual min/typ/max field values: ``` num *EQUAL* *VALUES*: …

fl4p updated 1 week ago
1
explodinggradients/ragas #1417

An 'answer_relevancy' error occurred.

- [x] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question. **Your Question** The following error occurred NotImplementedError: a…

TakutoIyanagi-littletree updated 3 days ago
1
mozilla/firefox-translations-training #819

Investigate using LLMs for evaluation

It would be interesting to compare evaluation capabilities of LLMs to COMET and human evaluation. See the paper: [Large Language Models Are State-of-the-Art Evaluators of Translation Quality](htt…

eu9ene updated 1 month ago
1
llm-jp/scripts #22

Automate evaluation (llm-jp-eval)

## 概要 [自動評価スクリプト](https://github.com/llm-jp/scripts/tree/main/evaluation/installers/llm-jp-eval-v1.3.1)の自動実行スクリプト作成 - 付随して[covert script](https://github.com/llm-jp/scripts/tree/main/pretrain/scrip…

YumaTsuta updated 1 month ago
25
eugeneyan/eugeneyan-comments #85

https://eugeneyan.com/writing/llm-evaluators/

# Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge) Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators. [https://eugeneyan.com/writing/llm-evaluators/…

utterances-bot updated 1 month ago
4
truera/trulens #1486

Evaluate local deployment finetuned LLM, LLM of openai inter…

``` from langchain_openai import ChatOpenAI import pandas as pd glm4_base_client = ChatOpenAI(model="glm-4v-9b", api_key="your_api_key", base…

2500035435 updated 1 week ago
15

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm