-
I'm following the instructions on this webpage:
https://github.com/google/BIG-bench/blob/main/README.md#how-do-i-create-a-task
There's a section that says:
> **Testing and evaluating**
>
> Onc…
-
See https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/simple_arithmetic/results/scores_GPT_GPT-3-200B.json#L33
Randl updated
2 years ago
-
https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/alignment_questionnaire/human_eval/210309/survey-iiu4lsyptvkry7ga6q3pmiqhya.xlsx
Hi, Some of the questions in this sheet assu…
-
BLEU and Rouge score are incorrectly reported as 0 when matching the targets with human evaluation on a task with a single example.
`foo/task.json` to reproduce:
```
{"canary": "BENCHMARK DATA …
-
Hi There!
This is a really cool corpus, @fchollet :-)
I'm wondering if a version of this task could be adapted to this ICLR workshop, centered on the construction of a big set of sequence-to-seq…
-
At Oracle Labs we have been seeing the reactors benchmark sometimes throw a `NullPointerException` and then deadlock. Here's a stack trace from a recent example:
```
...
====== reactors (concurrenc…
-
For each task, `/results/dummy_model.transcript.md` was auto-generated and placed into the task dir. For example, [this one](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/ruin…
-
On Windows 10 21H2 64bit with Java 11 GraalVM 22.0.0.2 **EE** the test "reactors (concurrency)" fails with NullPointerException on the **second** repetition:
(...snip...)
GC before operation: comp…
-
On Windows 10 21H2 64bit with Java 11 GraalVM 22.0.0.2 EE the test "reactors (concurrency)" fails with NullPointerException on the **second** repetition:
GC before operation: completed in 333.103 m…
-
Hello, are there publicly available scripts for how the evaluation was done for the fixed choice tasks in the [T0 Paper](https://arxiv.org/pdf/2110.08207.pdf)?
I tried implementing them myself but…