EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.42k stars 1.69k forks source link

Metrics that require probability scores (y_scores) #2272

Open Ofir408 opened 2 weeks ago

Ofir408 commented 2 weeks ago

Hi, I want to use the PR-AUC (or ROC-AUC) metrics for a few-shot classification problem where the test data is imbalanced. Therefore, I need the positive (yes) probability to calculate this via scikit-learn, not the y_pred How can I do that with lm-eval?

baberabb commented 1 week ago

Hi! you can get the model outputs as well as the metric score for each each example using --log_samples (use it with --output_path).