Model Selection Option
Provided users with an option to choose the model for evaluation. By default, loaded the GPT models for better evaluation, as some models may not have performed well in instruction tuning, leading to suboptimal results.
Save harness.run() Results
Implemented the functionality to save the results obtained from harness.run(). This enabled users to directly import the results and make necessary changes in the evaluation without rerunning the model. This enhancement facilitated a more efficient and flexible evaluation process.
Fixes #895
Type of change
Please delete options that are not relevant.
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[x] This change requires a documentation update
Usage
Checklist:
[ ] I've added Google style docstrings to my code.
[ ] I've used pydantic for typing when/where necessary.
Description
Model Selection Option Provided users with an option to choose the model for evaluation. By default, loaded the GPT models for better evaluation, as some models may not have performed well in instruction tuning, leading to suboptimal results.
Type of change
Please delete options that are not relevant.
Usage
Checklist:
pydantic
for typing when/where necessary.Screenshots (if appropriate):