aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
153 stars 40 forks source link

refactor: move repeated code in evaluate method into util functions and simplify the EvalAlgorithmInterface method signatures #224

Closed danielezhu closed 3 months ago

danielezhu commented 3 months ago

Note Apologies in advance for the massive diff. Note that although 25 files have been changed, the majority of these files have just a of couple lines changed (see bullet point 4 in the list below). Also, a huge chunk of the diff comes from the unit tests I wrote for the new util functions, and from deleting a bunch of unit tests that are now redundant.

Description of changes: This PR accomplishes two main tasks:

  1. Create util functions for boilerplate logic found in the evaluate method of nearly all evaluation algorithms. Source code diff, Unit test diff. Since these util functions will be called from the evaluate method of all eval algos as we move forward with the eval algo redesign, the amount of source code for eval algorithms will drastically shrink, as will the amount of unit test code. See the diff for the SummarizationAccuracy unit tests for how much we can get rid of per algorithm.
  2. Simplify and generalize the EvalAlgorithmInterface so that its role is reduced to enforcing the implementation of the two methods: evaluate_sample and evaluate, where full flexibility regarding their input arguments is given to implementers of concrete subclasses of EvalAlgorithmInterface. The one constraint that we continue to enforce is the output signature. Diff.

Several additional tasks that are accomplished are:

  1. Create util functions for semantic robustness algorithms and a base class, SemanticRobustnessConfig, that configs for the various robustness algorithms can inherit from. Diff
  2. Update SummarizationAccuracy to use the new util functions to shorten code. Source code diff, Unit test diff. Note that a ton of unit test code can be deleted, as all of the logic that was previously tested here is now being tested in the unit tests for the util functions I added.
  3. Update GeneralSemanticRobustness to use the new util functions to shorten code. Source code diff, Unit test diff
  4. Call get_eval_results_path when saving outputs so that the EVAL_RESULTS_PATH environment variable is respected even if it is set after the initialization of an EvalAlgorithmInterface object. This "bug"/unexpected behavior has been observed by @polaschwoebel.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.