understanding `input_text` versus `user_prompt`

Lovkush-A commented 1 month ago

(For convenience, I copy the methods below. Also, apologies in advance if I these are not useful questions and I am missing something straightforward)

I am finding it tricky to understand the differences between the two methods input_text versus user_prompt. I can see there are intended differences, but I'm not sure.

What is the difference between self._input and self.messages? My understanding is:
- self._input ought to be the original input without any modifications
- self.messages is the message history of the sample, which at initialization will have the same information as self._input but will likely get modified by the solvers.
In self.input_text, why does it use self.messages instead of self._input when the input is not str?
Are the generators used in each method equivalent? More precisely, is message.role == 'user' equivalent to isinstance(m, ChatMessageUser). I think they are the same, and if so, would prefer to use same syntax for both generators to make it easier to read.
self.user_prompt can raise a ValueError but self.input_text cannot. It is not clear why there is this difference.

    @property
    def input_text(self) -> str:
        """Sample input as text."""
        if isinstance(self._input, str):
            return self._input
        else:
            return next(
                (message.text for message in self.messages if message.role == "user"),
                "",
            )

    @property
    def user_prompt(self) -> ChatMessageUser:
        """User prompt for this state.

        Tasks are very general and can have may types of inputs.
        However, in many cases solvers assume they can interact with
        the state as a "chat" in a predictable fashion (e.g. prompt
        engineering solvers). This property enables easy read and
        write access to the user chat prompt. Raises an
        exception if there is no user prompt

        Returns:
           First user `ChatMessage` if the current state has one, else `None`
        """
        prompt = next(
            (m for m in self.messages if isinstance(m, ChatMessageUser)), None
        )
        if prompt:
            return prompt
        else:
            raise ValueError("User prompt requested from TaskState but none available")

aisi-inspect commented 1 month ago

Thanks so much for this! Docs are definitely inadequate here and as you point out we have some issues to patch up. (further responses inline below). One further note: we do development on a separate internal repo and then sync it here every week or two, so below when I say that something is "fixed" it's on the internal repo, will be doing another sync here in a day or two.

What is the difference between self._input and self.messages? My understanding is:

self._input ought to be the original input without any modifications

self.messages is the message history of the sample, which at initialization will have the same information as self._input but will likely get modified by the solvers.

This is exactly correct.

In self.input_text, why does it use self.messages instead of self._input when the input is not str?

That's a bug! We fixed this a few days ago and it will roll out here soon.

Are the generators used in each method equivalent? More precisely, is message.role == 'user' equivalent to isinstance(m, ChatMessageUser). I think they are the same, and if so, would prefer to use same syntax for both generators to make it easier to read.

You can use either formulation (it's a tagged union, see https://mypy.readthedocs.io/en/stable/literal_types.html#tagged-unions). I just made the change to make this consistent in this context (both now using message.role == 'user')

self.user_prompt can raise a ValueError but self.input_text cannot. It is not clear why there is this difference.

They are both trying to save the developer from having to check for None, which is technically possible but would be a malformed eval. I've changed it so they both now raise ValueError

As I mentioned, the docs on TaskState are not quite thorough or clear enough. Here's are some notes we've put together to further clarify (we'll refine these into a formal addition to the docs soon):

input - this is the original Sample input. May be a simple string for simple evals or a conversation which has already occurred and we’re evaluating with more complex solvers. Immutable
choices - Specific to multiple-choice samples, holds the different options we expect the model to pick from when we call generate
messages - The conversation we have with the model based off our initial input. May or may not initially be the same as the input depending on the solvers in use. For simple evals, the default generate call will add a new message as we go back and forth, but solvers may append to this list or delete from it and can essentially do whatever they want to it to build up a curated history of what’s been going on. Where complex conversations have gone on, complex scorers will likely use this to evaluate the performance of the model
output - The “final” model output after we’ve done all our solving. Will likely get updated at each step as we go through solvers in our plan. For simple evals this may just be the last message, but is flexible as the “output” of a model may not just be a single text answer. For example multiple choices selected or a it’ll be a flag captured by a tool
input_text - returns the first user “input” from the original Sample’s input. If that input is just a string it returns that, if it was a chat history we return the first message made by the user. Should be considered immutable in the same way as input.
user_prompt - returns the first message the user made in messages. Given it’s in messages, it’s completely mutable and fair game for solvers to fiddle with as they want (e.g. the multiple_choice solver does this to hide shuffling activity)

Lovkush-A commented 1 month ago

thanks for detailed answers. very helpful! totally understand that docs are still in progress - surprisingly hard to write good docs. and always a tradeoff between spending time perfecting things versus making the tool available for people to use and experiment with.

I realize this could be a big effort, but may I suggest renaming the two methods, something like:

input_text to first_user_text_input
user_prompt to first_user_message

Benefits:

using first makes explicit that it only gets the first item from list
using user makes explicit that it only gets items that correspond to usermessages, not other kinds of messages
replacing prompt with message makes variable names more consistent
first_user_message matches closely the Returns docstring: First user 'ChatMessage' if the current state....

jjallaire commented 1 month ago

Thanks, we'll definitely consider this! We've also contemplated making available a set of functions that operate on TaskState (which would have longer and more descriptive names like your suggestions here).

jjallaire commented 3 weeks ago

Updated docs on TaskState are available here: https://ukgovernmentbeis.github.io/inspect_ai/solvers.html#task-states-1

UKGovernmentBEIS / inspect_ai

understanding `input_text` versus `user_prompt` #22