-
The VZCode AI Assist feature prompting system is currently based on trial and error.
There are no tests or benchmarks to evaluate the quality of the prompt and test it under various scenarios.
We ne…
-
Hi, we have a VEK280 production silicon board B03 and we've been trying to benchmark Vitis AI on it.
The quick start guide (https://xilinx.github.io/Vitis-AI/3.5/html/docs/quickstart/vek280.html) h…
-
### Description of the bug:
## Description
I'm consistently encountering an Internal Server Error (HTTP 500) when trying to use the Google Gemini API through the Inspect evaluation framework. This…
-
2024-10-07 16:32:58.435003 │DepthFlow├┤9'41.685├┤INFO │ ▸ (Module 115 • CustomDepthf) Initializing scene 'CustomDepthflowScene' with backend headless
│DepthFlow├┤9'41.783├┤INFO │ ▸ (Module 115 • …
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
Error [example](https://github.com/getappmap/navie-benchmark/actions/runs/10949246453/job/30402054109#step:7:1216):
```
Handling exception: Error: Failed to complete: SSE Error: {"type":"error","e…
-
Steps:
- Develop use case
- Find and prep data
- Load and query
Resources:
- [ Caselaw Access Project (CAP)](https://case.law/)
- [US Code](https://uscode.house.gov/)
- [Legal AI Benchmarks](https:…
-
Title.
Benchmarks:
Summarization
- [x] G-Eval
- [ ] SummHay - https://arxiv.org/abs/2407.01370v1 & https://github.com/salesforce/summary-of-a-haystack
- https://arxiv.org/html/2403.19889v1
R…
-
Approved OpenSearch blog categories for building a blog filter for this page: https://opensearch.org/blog/
**Main headers:**
* Technical
* Community
* Partners
* Events
* Releases
**Sub-t…
-
Hi,
Nice work on indexing useful papers in this repo!
My team has released a new benchmark for the ability for "omni understanding" the image, audio and text of the MLLMs. We are quite confiden…