-
We should both provide the users and use ourselves reliable testing harness. This can take the form of a bogus gateway or pre-recorded VCR scenarios.
The support should include:
- onboarding
- us…
-
The documentation for the Test Harness is too terse and is hard to follow at times. It also needs a larger variety of examples especially for people just getting into using it.
Right now the best a…
-
@haileyschoelkopf @lintangsutawika @baberabb
The following is a list of TODOs to implement LLM-as-a-Judge in Eval-Harness:
**TLDR**
* Splits existing `evaluate` function into `classification_e…
-
Right now yath allows harness specific meta-data to be noted in test headers:
```
# HARNESS-NO-PRELOAD
# HARNESS-NO-STREAM
# HARNESS-CATEGORY-FOO
```
These are very useful for controlling harn…
-
Since I can't stop fiddling with stuff, I have added a Benchmark harness, and three benchmark tests
One "startup" to give the system 1 second to start up
One Stroke Benchmark.
- The first 240 frames…
-
With the current implementation of raft it is possible to disable all synchronous transactions by adding a new instance to the cluster until it is fully joined and up-to-date.
The simplest example:…
-
**Requested feature:** Provide a cleaner user interface by default and allow users to control its verbosity.
**Use case:** Kani output can be very verbose which makes it hard to read.
**Test case:…
-
Hi,
Thanks for sharing this package, it has lots of cool features!
I saw that arc-challenge was taking about twice longer that what I have with harness, I ran the following commands with lightev…
-
We should document common patterns for proof harnesses, along with explanations/justifications for why they're common or interesting to know about.
For example:
## No assumptions or assertions
…
-
What can be tested with ecukes and my conf?
malk updated
10 years ago