-
Integrate MDEL with various evaluation framework
- [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
- [helm](https://github.com/stanford-crfm/helm)
-
Create GH-Action for creation of evaluation reports and projects
* Create it in a separate branch (e.g. _CreateReports_)
* The action should be triggered either on demand or after pushing into the _…
-
### **Is your feature request related to a problem? Please describe.**
PyRIT currently lacks built-in support for easily using and comparing multiple LLM providers. This makes it challenging for user…
-
In Table 9 of the paper, your evaluation on GSM8K seems to use the 5 shot setting. May I know which evaluation library did you use? Is it `lm-evaluation-harness` or other existing GitHub implementatio…
-
Dear FELT research team,
First, I would like to express my sincere gratitude for your outstanding work on the FELT dataset and your [paper](https://arxiv.org/pdf/2403.05839), which has become an in…
-
_This issue has been moved from [a ticket on Developer Community](https://developercommunity.visualstudio.com/t/nuget-package-MicrosoftAspNetCoreSign/10790621)._
---
[severity:It's more difficult to …
-
It would be interesting to benchmark TEMML against [wikitexvc](https://arxiv.org/abs/2401.16786).
It was almost ten years ago that I developed an evaluation framework https://github.com/MaRDI4NFDI/…
-
While adding a GLUE-style benchmark for the Belarusian language, I noticed unexpected behavior. Below are details and example based on the `nq_open` task. I can suggest a fix and implement it, but fir…
-
1. opensource the article generation pipeline —— Yifan Luo
2. opensource the QAR & keypoint generation pipeline —— Dingling Xu
3. opensource the evaluation framework —— Kunlun Zhu
-
The current proposal as specified treats namespace keys as known at the time of execution deferral so that all instantiation errors have already been thrown, and all async work has been done. All earl…