refactor: update evaluate_dataset to take in a dataset instead of dataset config

Description of changes: This PR refactors the evaluate_dataset method to consume a dataset directly, instead of a dataset config. This will allow evaluate_dataset to be compatible with more eval algorithms.

The verify_model_determinism function has been updated in preparation for its usage in Summarization Accuracy Semantic Robustness. By taking in a prompt template and model input column, we can verify model determinism prior to executing the transforms for prompt-generation and model-invocation. This will allow SASR's evaluate method to follow the same overall template as all other algos.

This PR additionally removes the BertscoreHelperModel class, as we now use BertscoreModel.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aws / fmeval

refactor: update evaluate_dataset to take in a dataset instead of dataset config #232