GenBench / genbench_cbt_2023

The official Genbench Collaborative Benchmarking Task repository 2023 (Archived)
Other
14 stars 18 forks source link

[Task Submission] ICL consistency test (`icl_consistency_test`) #11

Closed LucWeber closed 10 months ago

LucWeber commented 1 year ago

ICL consistency test

This task tests the consistency of prompt-based model predictions across a wide range of different prompt-setups, calculating accuracy- and consistency-scores.

Authors

Implementation

There is no data-preprocessing necessary. We implemented a custom evaluate_predictions()-method to calculate accuracy and consistency scores for each setup separately.

Usage

The custom evaluate_predictions()-method accepts inputs in the default format with predictions expecting a Dict[str, Dict[str, Any]] and gold expecting a datasets.Dataset. For predictions, the keys of the outer dictionary should represent the setup_IDs and the keys of the inner dictionary should represent the respective data_IDs. For a fully implemented example evaluation pipeline using huggingface, see example_evaluation.py.

Checklist:

vernadankers commented 1 year ago

Hello!

We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), which is why I wanted to remind you of the fact that your PR still needs some attention. Please double-check the automated tests, and don't forget to submit your accompanying paper to Openreview via https://openreview.net/group?id=GenBench.org/2023/Workshop by September 1.

Good luck finalising your PR and paper, feel free to tag us if you have questions. Cheers, Verna On behalf of the GenBench team