aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
152 stars 40 forks source link

[Feature] EvalAlgorithmInterface.evaluate should accept a list of DataConfigs for consistency #269

Open athewsey opened 2 months ago

athewsey commented 2 months ago

Today EvalAlgorithmInterface.evaluate is typed to return List[EvalOutput] ("for dataset(s)", per the docstring), but its dataset_config argument only accepts Optional[DataConfig].

It seems like most concrete eval algorithms (like QAAccuracy here) either take the user's data_config for a single dataset, or take all the pre-defined DATASET_CONFIGS relevant to the evaluator's problem type.

...So the internal logic of evaluators is set up to support providing multiple datasets and returning multiple results already, but we seem to prevent users from calling evaluate() with multiple of their own datasets for no particular reason?

danielezhu commented 1 month ago

Your understanding is correct. Currently, evaluate can either be configured to use a single user-provided dataset (via data_config) or configured to use all of the "built-in" datasets. Your feature request certainly makes sense; there isn't a particularly compelling reason I can think of for why we shouldn't be able to evaluate multiple "custom" (i.e. user-provided) datasets.