-
As the picture shows below, the lm-evaluation-harness computes the metrics of sub-tasks (such as leaderboard_gpqa_diamond/extended/main), how can i get the metrics of leaderboard_gpqa?
![issue](https…
-
I tried this code:
```rust
#![feature(iter_advance_by)]
use std::iter::DoubleEndedIterator;
fn addr() -> usize {
let addr = kani::any_where::(|val| *val == 9223372036854775809);
// *…
-
Since Meson is such a complex project, we need to start integrating fuzzing into our process as early as possible. This will be a large task. As such, this issue is mainly keeping track of all the fuz…
-
Do you have any tips for getting this running? I've got a very large and well developed corpus of wgsl shader files from a previous fuzzing run which I would like to try with this fuzzer. I am attemp…
-
Why is inference more than twice as fast when using vLLM directly compared to using vLLM via lm-evaluation-harness?
-
First off all, thanks for this great utility.
As the title says, TQCollector is not recognizing the Atlantean Artifice charm, despite completing the quest for it an the charm itself being in the re…
-
Trying to run offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia …
-
| --- | --- |
| Bugzilla Link | [502379](https://bugs.eclipse.org/bugs/show_bug.cgi?id=502379) |
| Status | NEW |
| Importance | P3 normal |
| Reported | Sep 28, 2016 04:03 EDT |
| Modified | Sep…
-
... to be able to tell if they are warnings, successes, or information. Right now I only have styles for success messages (it will look like a success even if it's a warning or information message). @…
-
See https://ci.chromium.org/ui/p/chromium/builders/ci/linux-rel-no-external-ip/11745/overview
```
This is a wdspec test.
[FAIL] test_with_new_navigation_inside_page
Failed: DID NOT RAISE
Har…