llm-evaluation-framework Search Results

elastic/kibana #197241

Defining the LLM task framework

I'm opening this issue to discuss about what we think the "LLM task" framework should aim to be, and how we could incrementally get there. ## What we have today Today, what we call the "task framewo…

pgayvallet updated 1 week ago

elastic/kibana #197565

[Meta] [Automatic Import] Add Langsmith Evalutions

### Background: [Langsmith Evaluations](https://docs.smith.langchain.com/concepts/evaluation) are a way to evaluate the performance of Automatic Import. The [evaluation framework](https://docs.smith…

bhapas updated 1 week ago

microsoft/rag-experiment-accelerator #707

Add promptflow-evals quality metrics as an alternative to ra…

We currently leverage some llm based evaluation metrics from ragas: https://github.com/explodinggradients/ragas namely, `llm_context_precision`, `llm_context_recall` and `llm_answer_relevance` in thi…

prvenk updated 1 month ago

meta-llama/PurpleLlama #55

CyberSecEval page high/low value inconsistency

Many thanks for making this feature available. It's a great help. I wanted to let you know that your HuggingFace [CyberSecEval: Comprehensive Evaluation Framework for Cybersecurity Risks and Capab…

Arinbjarnar updated 4 days ago

Azure/PyRIT #353

FEAT: Add Unify Integration for Multi-Provider LLM Support

### **Is your feature request related to a problem? Please describe.** PyRIT currently lacks built-in support for easily using and comparing multiple LLM providers. This makes it challenging for user…

KatoStevenMubiru updated 1 month ago

huu4ontocord/MDEL #37

Integrate with LLM evaluation frameworks

Integrate MDEL with various evaluation framework - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) - [helm](https://github.com/stanford-crfm/helm)

kenhktsui updated 1 year ago

irthomasthomas/undecidability #895

[2308.07201] ChatEval: Towards Better LLM-based Evaluators t…

- [ ] [[2308.07201] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate](https://arxiv.org/abs/2308.07201) # [ChatEval: Towards Better LLM-based Evaluators through Multi-Agent De…

ShellLM updated 2 months ago

kubeedge/ianvs #98

Smart Coding benchmark suite: built on KubeEdge-lanvs

**What would you like to be added/modified:** 1. Build a collaborative code intelligent agent alignment dataset for LLMs: - The dataset should include behavioral trajectories, feedback, and i…

YangBrooksHan updated 2 months ago

explodinggradients/ragas #1188

Integrating third-party LLMs for Evaluating Chinese-native R…

Hi there, Thank you for bringing the elegant RAG Assessment framework to the community. I am an AI engineer from Alibaba Cloud, and our team has been fine-tuning LLM-as-a-Judge models based on t…

hurenjun updated 1 month ago

rmusser01/tldw #29

Improvement: Add Evaluation tests for LLMs

This issue is now to track the implementation of various evaluation methods and workflows for LLMs. Evaluations: - [x] G-Eval - [ ] PingPong - [ ] InfiniteBench - [ ] Ruler - [ ] MMLU - [ ] M…

rmusser01 updated 22 hours ago

482 results for llm-evaluation-framework

482 results
for llm-evaluation-framework