aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
187 stars 42 forks source link

fix: change prompt_template with summarize instruction #182

Closed taturabe closed 7 months ago

taturabe commented 8 months ago

Issue #, if available:

In this example of Summarization accuracy evaluation, prompts "Human: $feature\n\nAssistant:\n\" are used. However, since this prompt has no instruction to generate a summary in one sentence, the output of the model produces a result that is far from a summary.

As a way to run a module to perform Evaluation, this prompt is not problematic. In fact, the evaluation method succeeds. However, this prompt could be misinterpreted as the SummarizationAccuracy class having an internally preset instruction of summarization.

Description of changes:

This pull request change makes it clear that if a builtin dataset is not used, the user must set the instruction to the prompt_template to match the instruction to the task. Or, the user must set nothing for prompt_template so that fmeval uses default template (It is no clear for me that default one is suitable for each task.)

Also, the model output of this PR change will return an appropriate one-sentence summary, and the evaluation score will be improved. This makes the example more relevant to actual use cases.


old_prompt new_prompt

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

review-notebook-app[bot] commented 8 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

taturabe commented 7 months ago

Hi, @danielezhu

This PR does not appear to have been incorporated into the merge as it needs to be reviewed, could you confirm that?

danielezhu commented 7 months ago

Hi @taturabe, I've included your suggested change as a part of my PR #192. This PR will be merged before we do a release (soon).

taturabe commented 7 months ago

Understood! Thank you so much! @danielezhu

danielezhu commented 7 months ago

PR #192 has been merged, so I will close this one.