Closed lpfennigschmidt closed 9 months ago
I like the idea and the contribution. Could you open a PR? We can have this merged quite soon I guess as it looks backwards compatible. The code looks cleaner and the main loop is easier to understand by pushing the functionality into self.prompt().
We might want to mark the reprompting in the logs. So maybe provide an optional argument to change the event action type. e.g. "send message (reprompt)". Could you test if this works for the transcripts?
We might want to mark the reprompting in the logs. So maybe provide an optional argument to change the event action type. e.g. "send message (reprompt)". Could you test if this works for the transcripts?
Yeah, this should definitely be in the logs and be accessible for scoring. Models 'getting it right' on the first try is very impactful for end users, and to my knowledge there is no other benchmark that tracks this concisely (at least at the intricate level clembench allows for).
Yup, will do a PR on Monday, just need to figure out how to :) I have to open one from my fork, right?
Yup, will do a PR on Monday, just need to figure out how to :) I have to open one from my fork, right?
Yes, and if the fork is set up properly, it should be straight-forward by clicking the button on GH.
When using the DialogueGameMaster there is currently no easy way to reprompt a model on an invalid response. The fix is to extract the prompt function into its separate function and introduce two more hooks into the framework that do not break any other games:
The prompting mechanism is extracted into a separate function:
We add two hooks whether re-prompting should be done and enabling a message to be added before reprompting:
Then the play-function becomes this:
🎉