Note
Apologies in advance for the massive diff. Note that although 25 files have been changed, the majority of these files have just a of couple lines changed (see bullet point 4 in the list below). Also, a huge chunk of the diff comes from the unit tests I wrote for the new util functions, and from deleting a bunch of unit tests that are now redundant.
Description of changes:
This PR accomplishes two main tasks:
Create util functions for boilerplate logic found in the evaluate method of nearly all evaluation algorithms. Source code diff, Unit test diff. Since these util functions will be called from the evaluate method of all eval algos as we move forward with the eval algo redesign, the amount of source code for eval algorithms will drastically shrink, as will the amount of unit test code. See the diff for the SummarizationAccuracy unit tests for how much we can get rid of per algorithm.
Simplify and generalize the EvalAlgorithmInterface so that its role is reduced to enforcing the implementation of the two methods: evaluate_sample and evaluate, where full flexibility regarding their input arguments is given to implementers of concrete subclasses of EvalAlgorithmInterface. The one constraint that we continue to enforce is the output signature. Diff.
Several additional tasks that are accomplished are:
Create util functions for semantic robustness algorithms and a base class, SemanticRobustnessConfig, that configs for the various robustness algorithms can inherit from. Diff
Update SummarizationAccuracy to use the new util functions to shorten code. Source code diff, Unit test diff. Note that a ton of unit test code can be deleted, as all of the logic that was previously tested here is now being tested in the unit tests for the util functions I added.
Call get_eval_results_path when saving outputs so that the EVAL_RESULTS_PATH environment variable is respected even if it is set after the initialization of an EvalAlgorithmInterface object. This "bug"/unexpected behavior has been observed by @polaschwoebel.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Note Apologies in advance for the massive diff. Note that although 25 files have been changed, the majority of these files have just a of couple lines changed (see bullet point 4 in the list below). Also, a huge chunk of the diff comes from the unit tests I wrote for the new util functions, and from deleting a bunch of unit tests that are now redundant.
Description of changes: This PR accomplishes two main tasks:
evaluate
method of nearly all evaluation algorithms. Source code diff, Unit test diff. Since these util functions will be called from theevaluate
method of all eval algos as we move forward with the eval algo redesign, the amount of source code for eval algorithms will drastically shrink, as will the amount of unit test code. See the diff for theSummarizationAccuracy
unit tests for how much we can get rid of per algorithm.EvalAlgorithmInterface
so that its role is reduced to enforcing the implementation of the two methods:evaluate_sample
andevaluate
, where full flexibility regarding their input arguments is given to implementers of concrete subclasses ofEvalAlgorithmInterface
. The one constraint that we continue to enforce is the output signature. Diff.Several additional tasks that are accomplished are:
SemanticRobustnessConfig
, that configs for the various robustness algorithms can inherit from. DiffSummarizationAccuracy
to use the new util functions to shorten code. Source code diff, Unit test diff. Note that a ton of unit test code can be deleted, as all of the logic that was previously tested here is now being tested in the unit tests for the util functions I added.GeneralSemanticRobustness
to use the new util functions to shorten code. Source code diff, Unit test diffget_eval_results_path
when saving outputs so that theEVAL_RESULTS_PATH
environment variable is respected even if it is set after the initialization of anEvalAlgorithmInterface
object. This "bug"/unexpected behavior has been observed by @polaschwoebel.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.