-
Hi Zechun,
great work!
Could you please share the details about the evaluation code? like which codebase was used to run inference etc.
thank you,
Kalyani
-
Code evaluation task/benchmark such as HumanEval and MBPP are missing from **lm-evaluation-harness**, but are present and maintained in **bigcode-evaluation-harness**.
https://github.com/bigcode-pr…
-
### User story
As an evaluator, in order to provide fair evaluation for the submission the public solvers submitted, I would like to be able to indicate if I am not able to evaluate the submission I …
-
Hello authors, thank you for your great work! And I wonder when the Evaluation Pipeline for BGE-EN-ICL will be released?
-
If I have downloaded the pretrained models, how to evaluate on PeopleSnapshot? I have tried running this code: `python eval.py --cfg exps/snapshot_"$SCENE".yaml --type view`. However, the metrics are …
-
Is there an easy way to visualize the result?
-
Unity is the OIDC AAI tool that we are evaluating, from the Helmholtz AAI.
Carmen Scheuner has an open ticket at https://support.hifis.net/#ticket/zoom/7628 (Helmholtz support - Unity providers).
…
-
The MOT merely states that the model is pending evaluation without giving any information as to what this means and what it will take to change this.
This actually appears to be rooted in the fact th…
-
We will need to test our models against common, industry-standard benchmarks. Pythia is what everyone uses today:
https://github.com/EleutherAI/lm-evaluation-harness
The process will involve:
-…
-
Currently if you want to train and evaluate a model you run the following script:
```bash
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status.
source $(conda info --…