Open caiqizh opened 4 weeks ago
Hi, @caiqizh!
Indeed, it can be done with current codebase. You can look it up in example for TriviaQA. If there's multiref
option set to true
in the yaml config, like here: https://github.com/IINemo/lm-polygraph/blob/main/examples/configs/polygraph_eval_triviaqa.yaml#L24, all generation metrics will be wrapped into AggregatedMetric
class (https://github.com/IINemo/lm-polygraph/blob/main/src/lm_polygraph/generation_metrics/aggregated_metric.py). This will apply aggregation (currently only max
) to values of generation metric produced by comparing model's output to all references. For Accuracy
, for example, this means that for a given input, Accuracy metric will be equal to 1 if any of the alternative references match the output of the model.
We really need to work on updating and expanding documentation... @IINemo
I will keep this issue open until we ship docs with this functionality clearly explained.
I am wondering for questions with multiple correct answers (or those with many alternative answers), can the current generation metrics handle this?
Thank you for the great codebase!