-
Need to answer the question, "Is federated data a feasible approach to accessing data and how do we know?". What are the indicators to help us address this answer?
Initial ideas were shared with the…
-
Code evaluation task/benchmark such as HumanEval and MBPP are missing from **lm-evaluation-harness**, but are present and maintained in **bigcode-evaluation-harness**.
https://github.com/bigcode-pr…
-
Test generated compiler with patched ETISS. (This should probably be part of a 3rd part repository)
**Steps:**
- [x] #70
- [x] #71
- [x] #72
- [ ] #77
- [x] #73
- [x] #74
- [ ] #75
- [ ] …
-
Related to https://github.com/openmpf/openmpf/issues/1415 and https://github.com/openmpf/openmpf/issues/1416.
- Enable the framework to accept an input JSON file that describes the Docker images an…
-
Hello Tanner,
This is not exactly an issue. I have a few questions as I have recently started to explore the framework.
1. Does this framework can evaluate all kind of Fully Digital CiM framewor…
-
# [RFC] OpenSearch Search Quality Evaluation Framework
## Introduction
[User Behavior Insights](https://github.com/opensearch-project/OpenSearch/issues/12084) (UBI) provides OpenSearch users wit…
-
Thank you for your great work!
I wonder if it can be integrated into popular evaluation frameworks like lmms_eval or vlmevalkit for easier use by everyone?
-
Users have expressed an interest in using the framework to evaluate the transcriptions generated by speech detection components. Since the FiftyOne tool is geared towards images, it may not be a good …
-
During SpEL evaluation, if the `TypeDescriptor` is recursive, the evaluation results in infinite recursion causing stack overflow.
This line:
https://github.com/spring-projects/spring-framework/…
-
There is a need to be able to compare components that generate the same type of detections. For example, if two components generate `FACE` tracks we need to determine which one is better in terms of p…