evaluation-framework Search Results

1000+ results
for evaluation-framework

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huu4ontocord/MDEL #37

Integrate with LLM evaluation frameworks

Integrate MDEL with various evaluation framework - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) - [helm](https://github.com/stanford-crfm/helm)

kenhktsui updated 1 year ago
3
Open-Systems-Pharmacology/OSP-PBPK-Model-Library #83

Automate creation of PBPK model reports

Create GH-Action for creation of evaluation reports and projects * Create it in a separate branch (e.g. _CreateReports_) * The action should be triggered either on demand or after pushing into the _…

Yuri05 updated 1 month ago
1
Azure/PyRIT #353

FEAT: Add Unify Integration for Multi-Provider LLM Support

### **Is your feature request related to a problem? Please describe.** PyRIT currently lacks built-in support for easily using and comparing multiple LLM providers. This makes it challenging for user…

KatoStevenMubiru updated 2 weeks ago
6
princeton-nlp/SimPO #73

Query about GSM8K evaluation

In Table 9 of the paper, your evaluation on GSM8K seems to use the 5 shot setting. May I know which evaluation library did you use? Is it `lm-evaluation-harness` or other existing GitHub implementatio…

HCY123902 updated 6 days ago
4
Event-AHU/FELT_SOT_Benchmark #6

Questions on FELT Dataset and Baseline Trackers in Your Rese…

Dear FELT research team, First, I would like to express my sincere gratitude for your outstanding work on the FELT dataset and your [paper](https://arxiv.org/pdf/2403.05839), which has become an in…

MR-Vico updated 12 hours ago
8
dotnet/aspnetcore #59085

nuget package 'Microsoft.AspNetCore.SignalR.Client' is not c…

_This issue has been moved from [a ticket on Developer Community](https://developercommunity.visualstudio.com/t/nuget-package-MicrosoftAspNetCoreSign/10790621)._ --- [severity:It's more difficult to …

vsfeedback updated 23 hours ago
2
ronkok/Temml #70

Benchmark TEMML against wikitexvc

It would be interesting to benchmark TEMML against [wikitexvc](https://arxiv.org/abs/2401.16786). It was almost ten years ago that I developed an evaluation framework https://github.com/MaRDI4NFDI/…

physikerwelt updated 1 month ago
6
EleutherAI/lm-evaluation-harness #2412

`AnthropicChat` fails when "until" is not provided explicitl…

While adding a GLUE-style benchmark for the Belarusian language, I noticed unexpected behavior. Below are details and example based on the `nq_open` task. I can suggest a fix and implement it, but fir…

maaxap updated 3 weeks ago
2
OpenBMB/RAGEval #1

[Task] Opensource the RAGEval Pipeline

1. opensource the article generation pipeline —— Yifan Luo 2. opensource the QAR & keypoint generation pipeline —— Dingling Xu 3. opensource the evaluation framework —— Kunlun Zhu

Kunlun-Zhu updated 1 month ago
9
tc39/proposal-defer-import-eval #54

Deferred keys and weakening early error timing

The current proposal as specified treats namespace keys as known at the time of execution deferral so that all instantiation errors have already been thrown, and all async work has been done. All earl…

guybedford updated 1 week ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for evaluation-framework

1000+ results
for evaluation-framework