aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
155 stars 40 forks source link

chore: remove XSUM dataset from example notebook and integration tests #192

Closed danielezhu closed 4 months ago

danielezhu commented 4 months ago

Description of changes: This PR is a follow-up of #191, where the last traces of the XSUM dataset are removed from the codebase. The integration tests that used XSUM now use Gigaword, and have had their expected values updated.

This PR also updates all of the integration tests so that ray.shutdown() is called in between the tests for each evaluation algorithm. This is used to clean up resources in between tests, and has reduced the mask disk usage during testing from ~18 GB to ~6 GB.

Lastly, this PR moves the initialization of the SummarizationAccuracy object in test_summarization_accuracy.py from the top of the file into the test method. This is required because code at the top level of every file gets run at the very start of testing, before any tests are executed. This means that the BertscoreHelperModel actor created by the SummarizationAccuracy object also gets created right from the beginning. When we call ray.shutdown() the first time, it will clean up the BertscoreHelperModel resource, meaning that by the time we execute the summarization accuracy integ test, said actor will not exist as expected, and the test will fail.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

review-notebook-app[bot] commented 4 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB