-
https://github.com/soarsmu/BugsInPy
-
# Evaluation
A core part of our *Strategic* work is the evaluation of how proposed work serves the Web. In the "Evaluation" phase at the end of the funnel, we make the case whether work is ready to…
-
- **Objective:** Ensure RAG components are reliable through evaluation and safeguards.
- **Details:**
- Define performance metrics and monitor system accuracy.
- Implement safeguards to h…
-
## Reason
Expensive material models provided by external codes make use of vectorized evaluation on accelerator devices (e.g. GPUs). For these models to operate with optimal efficiency we need to pro…
-
This issue is associated with the charter epic #4 and #7
# Which mapping tool or framework do you want to discuss?
https://worldfair-project.eu/cross-domain-interoperability-framework/
Overview…
-
This issue is intended for tracking milestones for the 2023 NumFocus small development grant for working on `scipy.special` infrastructure.
The original plan for this grant was to work on developin…
-
**Note: this is incomplete to use as example to kick off discussion**
An ability to set options based on conditional logic formatted in a consistent way that can be implemented in all potential framew…
-
- [ ] [GPTScore: A Novel Evaluation Framework for Text Generation Models](https://github.com/jinlanfu/GPTScore?tab=readme-ov-file)
# GPTScore: A Novel Evaluation Framework for Text Generation Models
…
-
### Feature Summary
I'd like to contribute to FinVeda by implementing a machine learning module that can predict financial trends, stock prices, and customer behavior. This module will leverage popul…
-
- [ ] [[2308.07201] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate](https://arxiv.org/abs/2308.07201)
# [ChatEval: Towards Better LLM-based Evaluators through Multi-Agent De…