aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
153 stars 40 forks source link

refactor: update evaluate_dataset to take in a dataset instead of dataset config #232

Closed danielezhu closed 3 months ago

danielezhu commented 3 months ago

Description of changes: This PR refactors the evaluate_dataset method to consume a dataset directly, instead of a dataset config. This will allow evaluate_dataset to be compatible with more eval algorithms.

The verify_model_determinism function has been updated in preparation for its usage in Summarization Accuracy Semantic Robustness. By taking in a prompt template and model input column, we can verify model determinism prior to executing the transforms for prompt-generation and model-invocation. This will allow SASR's evaluate method to follow the same overall template as all other algos.

This PR additionally removes the BertscoreHelperModel class, as we now use BertscoreModel.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.