Description of changes:
This PR refactors the evaluate_dataset method to consume a dataset directly, instead of a dataset config. This will allow evaluate_dataset to be compatible with more eval algorithms.
The verify_model_determinism function has been updated in preparation for its usage in Summarization Accuracy Semantic Robustness. By taking in a prompt template and model input column, we can verify model determinism prior to executing the transforms for prompt-generation and model-invocation. This will allow SASR's evaluate method to follow the same overall template as all other algos.
This PR additionally removes the BertscoreHelperModel class, as we now use BertscoreModel.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Description of changes: This PR refactors the
evaluate_dataset
method to consume a dataset directly, instead of a dataset config. This will allowevaluate_dataset
to be compatible with more eval algorithms.The
verify_model_determinism
function has been updated in preparation for its usage in Summarization Accuracy Semantic Robustness. By taking in a prompt template and model input column, we can verify model determinism prior to executing the transforms for prompt-generation and model-invocation. This will allow SASR'sevaluate
method to follow the same overall template as all other algos.This PR additionally removes the
BertscoreHelperModel
class, as we now useBertscoreModel
.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.