Open athewsey opened 7 months ago
Your understanding is correct. Currently, evaluate
can either be configured to use a single user-provided dataset (via data_config
) or configured to use all of the "built-in" datasets. Your feature request certainly makes sense; there isn't a particularly compelling reason I can think of for why we shouldn't be able to evaluate multiple "custom" (i.e. user-provided) datasets.
Today EvalAlgorithmInterface.evaluate is typed to return
List[EvalOutput]
("for dataset(s)", per the docstring), but itsdataset_config
argument only acceptsOptional[DataConfig]
.It seems like most concrete eval algorithms (like QAAccuracy here) either take the user's
data_config
for a single dataset, or take all the pre-defined DATASET_CONFIGS relevant to the evaluator's problem type....So the internal logic of evaluators is set up to support providing multiple datasets and returning multiple results already, but we seem to prevent users from calling
evaluate()
with multiple of their own datasets for no particular reason?