Closed Lovkush-A closed 3 weeks ago
Thanks so much for this! Docs are definitely inadequate here and as you point out we have some issues to patch up. (further responses inline below). One further note: we do development on a separate internal repo and then sync it here every week or two, so below when I say that something is "fixed" it's on the internal repo, will be doing another sync here in a day or two.
- What is the difference between
self._input
andself.messages
? My understanding is:
self._input
ought to be the original input without any modificationsself.messages
is the message history of the sample, which at initialization will have the same information asself._input
but will likely get modified by the solvers.
This is exactly correct.
- In
self.input_text
, why does it useself.messages
instead ofself._input
when the input is notstr
?
That's a bug! We fixed this a few days ago and it will roll out here soon.
- Are the generators used in each method equivalent? More precisely, is
message.role == 'user'
equivalent toisinstance(m, ChatMessageUser)
. I think they are the same, and if so, would prefer to use same syntax for both generators to make it easier to read.
You can use either formulation (it's a tagged union, see https://mypy.readthedocs.io/en/stable/literal_types.html#tagged-unions). I just made the change to make this consistent in this context (both now using message.role == 'user'
)
self.user_prompt
can raise a ValueError butself.input_text
cannot. It is not clear why there is this difference.
They are both trying to save the developer from having to check for None
, which is technically possible but would be a malformed eval. I've changed it so they both now raise ValueError
As I mentioned, the docs on TaskState
are not quite thorough or clear enough. Here's are some notes we've put together to further clarify (we'll refine these into a formal addition to the docs soon):
input
- this is the original Sample
input. May be a simple string for simple evals or a conversation which has already occurred and we’re evaluating with more complex solvers. Immutable
choices
- Specific to multiple-choice samples, holds the different options we expect the model to pick from when we call generate
messages
- The conversation we have with the model based off our initial input. May or may not initially be the same as the input
depending on the solvers in use. For simple evals, the default generate
call will add a new message as we go back and forth, but solvers may append to this list or delete from it and can essentially do whatever they want to it to build up a curated history of what’s been going on. Where complex conversations have gone on, complex scorers will likely use this to evaluate the performance of the model
output
- The “final” model output after we’ve done all our solving. Will likely get updated at each step as we go through solvers in our plan. For simple evals this may just be the last message
, but is flexible as the “output” of a model may not just be a single text answer. For example multiple choices selected or a it’ll be a flag captured by a tool
input_text
- returns the first user “input” from the original Sample’s input
. If that input is just a string it returns that, if it was a chat history we return the first message made by the user. Should be considered immutable in the same way as input
.
user_prompt
- returns the first message the user made in messages
. Given it’s in messages
, it’s completely mutable and fair game for solvers to fiddle with as they want (e.g. the multiple_choice
solver does this to hide shuffling activity)
thanks for detailed answers. very helpful! totally understand that docs are still in progress - surprisingly hard to write good docs. and always a tradeoff between spending time perfecting things versus making the tool available for people to use and experiment with.
I realize this could be a big effort, but may I suggest renaming the two methods, something like:
input_text
to first_user_text_input
user_prompt
to first_user_message
Benefits:
first
makes explicit that it only gets the first item from listuser
makes explicit that it only gets items that correspond to usermessages, not other kinds of messagesprompt
with message
makes variable names more consistent first_user_message
matches closely the Returns docstring: First user 'ChatMessage' if the current state...
.Thanks, we'll definitely consider this! We've also contemplated making available a set of functions that operate on TaskState
(which would have longer and more descriptive names like your suggestions here).
Updated docs on TaskState are available here: https://ukgovernmentbeis.github.io/inspect_ai/solvers.html#task-states-1
(For convenience, I copy the methods below. Also, apologies in advance if I these are not useful questions and I am missing something straightforward)
I am finding it tricky to understand the differences between the two methods
input_text
versususer_prompt
. I can see there are intended differences, but I'm not sure.What is the difference between
self._input
andself.messages
? My understanding is:self._input
ought to be the original input without any modificationsself.messages
is the message history of the sample, which at initialization will have the same information asself._input
but will likely get modified by the solvers.In
self.input_text
, why does it useself.messages
instead ofself._input
when the input is notstr
?Are the generators used in each method equivalent? More precisely, is
message.role == 'user'
equivalent toisinstance(m, ChatMessageUser)
. I think they are the same, and if so, would prefer to use same syntax for both generators to make it easier to read.self.user_prompt
can raise a ValueError butself.input_text
cannot. It is not clear why there is this difference.