Explicitly state when to compute confidence intervals

IBM / unitxt

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

https://www.unitxt.ai

Apache License 2.0

151 stars 38 forks source link

Explicitly state when to compute confidence intervals #432

Open matanor opened 8 months ago

matanor commented 8 months ago

Today confidence intervals are computed by default for the main_score. This PR adds the capability of computing confidence intervals for additional scores.

We would like to change the confidence interval default, such that is is not computed by default, but rather it is only computed when explicitly stated in the metric.

matanor commented 8 months ago

Today we have a mechanism for disabling confidence interval calculation, by setting n_resamples to None. That mechanism is used as the implementation of a command line parameter in FM-Eval.

There is also a mechanism for specifying a list of confidence interval scores, on which the confidence intervals are computed. This is implemented for instance metrics.

The suggestion is that the enable/disable mechanism of the confidence interval computation will be implemented only with the list of score names, with an empty list to indicate no computation. The n_resamples flag will no longer support a value of None.

assaftibm commented 8 months ago

for which metrics CI is disabled? and why?

assaftibm commented 8 months ago

I can see why latency can become an issue, but this is the case only for global metrics. For instance metrics, the CI computation should be very fast.

matanor commented 8 months ago

for which metrics CI is disabled? and why?

Is was disabled for the default version of rouge (here). The reason is runtime. There are other cases of users that asked to turn it off. I think indeed the runtime issue is mainly for global metrics.