UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://UKGovernmentBEIS.github.io/inspect_ai/
MIT License
385 stars 41 forks source link

understanding `input_text` versus `user_prompt` #22

Closed Lovkush-A closed 3 weeks ago

Lovkush-A commented 1 month ago

(For convenience, I copy the methods below. Also, apologies in advance if I these are not useful questions and I am missing something straightforward)

I am finding it tricky to understand the differences between the two methods input_text versus user_prompt. I can see there are intended differences, but I'm not sure.

  1. What is the difference between self._input and self.messages? My understanding is:

    • self._input ought to be the original input without any modifications
    • self.messages is the message history of the sample, which at initialization will have the same information as self._input but will likely get modified by the solvers.
  2. In self.input_text, why does it use self.messages instead of self._input when the input is not str?

  3. Are the generators used in each method equivalent? More precisely, is message.role == 'user' equivalent to isinstance(m, ChatMessageUser). I think they are the same, and if so, would prefer to use same syntax for both generators to make it easier to read.

  4. self.user_prompt can raise a ValueError but self.input_text cannot. It is not clear why there is this difference.


    @property
    def input_text(self) -> str:
        """Sample input as text."""
        if isinstance(self._input, str):
            return self._input
        else:
            return next(
                (message.text for message in self.messages if message.role == "user"),
                "",
            )

    @property
    def user_prompt(self) -> ChatMessageUser:
        """User prompt for this state.

        Tasks are very general and can have may types of inputs.
        However, in many cases solvers assume they can interact with
        the state as a "chat" in a predictable fashion (e.g. prompt
        engineering solvers). This property enables easy read and
        write access to the user chat prompt. Raises an
        exception if there is no user prompt

        Returns:
           First user `ChatMessage` if the current state has one, else `None`
        """
        prompt = next(
            (m for m in self.messages if isinstance(m, ChatMessageUser)), None
        )
        if prompt:
            return prompt
        else:
            raise ValueError("User prompt requested from TaskState but none available")
aisi-inspect commented 1 month ago

Thanks so much for this! Docs are definitely inadequate here and as you point out we have some issues to patch up. (further responses inline below). One further note: we do development on a separate internal repo and then sync it here every week or two, so below when I say that something is "fixed" it's on the internal repo, will be doing another sync here in a day or two.

  1. What is the difference between self._input and self.messages? My understanding is:
    • self._input ought to be the original input without any modifications
    • self.messages is the message history of the sample, which at initialization will have the same information as self._input but will likely get modified by the solvers.

This is exactly correct.

  1. In self.input_text, why does it use self.messages instead of self._input when the input is not str?

That's a bug! We fixed this a few days ago and it will roll out here soon.

  1. Are the generators used in each method equivalent? More precisely, is message.role == 'user' equivalent to isinstance(m, ChatMessageUser). I think they are the same, and if so, would prefer to use same syntax for both generators to make it easier to read.

You can use either formulation (it's a tagged union, see https://mypy.readthedocs.io/en/stable/literal_types.html#tagged-unions). I just made the change to make this consistent in this context (both now using message.role == 'user')

  1. self.user_prompt can raise a ValueError but self.input_text cannot. It is not clear why there is this difference.

They are both trying to save the developer from having to check for None, which is technically possible but would be a malformed eval. I've changed it so they both now raise ValueError

As I mentioned, the docs on TaskState are not quite thorough or clear enough. Here's are some notes we've put together to further clarify (we'll refine these into a formal addition to the docs soon):

Lovkush-A commented 1 month ago

thanks for detailed answers. very helpful! totally understand that docs are still in progress - surprisingly hard to write good docs. and always a tradeoff between spending time perfecting things versus making the tool available for people to use and experiment with.

I realize this could be a big effort, but may I suggest renaming the two methods, something like:

Benefits:

jjallaire commented 1 month ago

Thanks, we'll definitely consider this! We've also contemplated making available a set of functions that operate on TaskState (which would have longer and more descriptive names like your suggestions here).

jjallaire commented 3 weeks ago

Updated docs on TaskState are available here: https://ukgovernmentbeis.github.io/inspect_ai/solvers.html#task-states-1