Striveworks / valor

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.
https://striveworks.github.io/valor/
Other
38 stars 4 forks source link

ENH: Adjust `ConfusionMatrix` metric. #807

Open czaloom opened 1 month ago

czaloom commented 1 month ago

Feature Type

Problem Description

@jtjohnston brought up some concerns about how the ConfusionMatrix metric details hallucinations separately. The primary concern is that it is not fully clear how hallucinations relate to overall false-positives and how it relates to the confusion matrix value.

Feature Description

Option 1

Option 2

Additional Context

No response