Closed taturabe closed 7 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hi, @danielezhu
This PR does not appear to have been incorporated into the merge as it needs to be reviewed, could you confirm that?
Hi @taturabe, I've included your suggested change as a part of my PR #192. This PR will be merged before we do a release (soon).
Understood! Thank you so much! @danielezhu
PR #192 has been merged, so I will close this one.
Issue #, if available:
In this example of Summarization accuracy evaluation, prompts "Human: $feature\n\nAssistant:\n\" are used. However, since this prompt has no instruction to generate a summary in one sentence, the output of the model produces a result that is far from a summary.
As a way to run a module to perform Evaluation, this prompt is not problematic. In fact, the evaluation method succeeds. However, this prompt could be misinterpreted as the SummarizationAccuracy class having an internally preset instruction of summarization.
Description of changes:
This pull request change makes it clear that if a builtin dataset is not used, the user must set the instruction to the prompt_template to match the instruction to the task. Or, the user must set nothing for prompt_template so that fmeval uses default template (It is no clear for me that default one is suitable for each task.)
Also, the model output of this PR change will return an appropriate one-sentence summary, and the evaluation score will be improved. This makes the example more relevant to actual use cases.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.