UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
558 stars 93 forks source link

[bug] Inspect viewer does not show inputs for `fork` subtasks #490

Closed rusheb closed 2 days ago

rusheb commented 3 days ago

Reproducing example

from inspect_ai import Task, task
from inspect_ai.dataset import MemoryDataset, Sample
from inspect_ai.solver import Generate, Solver, TaskState, fork, solver
from inspect_ai.util import subtask

### SUBTASK ###
@subtask
async def mysubtask(id: str) -> str:
    return ""

@solver
def subtask_solver() -> Solver:
    async def solve(state: TaskState, generate: Generate):
        await mysubtask(id="my_subtask")
        return state

    return solve

### FORK ###
@solver
def fork_solver_parent() -> Solver:
    async def solve(state: TaskState, generate: Generate):
        await fork(state, [fork_solver_child(id="my_fork_child")])
        return state

    return solve

@solver
def fork_solver_child(id: str) -> Solver:
    async def solve(state: TaskState, generate: Generate):
        return state

    return solve

### TASK ###
@task
def mytask():
    return Task(
        dataset=MemoryDataset([Sample(input="")]),
        plan=[
            subtask_solver(),
            fork_solver_parent(),
        ],
    )

Actual behaviour

Looking at the following screenshot: image

Expected behaviour

I'd expect the SUBTASK: FORK_SOLVER_CHILD node to show INPUT id="my_fork_child"

Notes

jjallaire-aisi commented 3 days ago

The input for a fork() subtask is just the current task state (inputs to solver that are passed to it aren't plucked out and treated as "inputs"). I think the right solution here may be to teach the viewer about the fork() variety of subtask and have it provide special UI treatment for that.

rusheb commented 3 days ago

That makes sense. Thanks for the clarification!

jjallaire commented 2 days ago

Resolved with https://github.com/UKGovernmentBEIS/inspect_ai/pull/499

rusheb commented 2 days ago

Thank you. Note that for my current use-case it's quite key to see the arguments to the outer solver of the fork, because it tells us which scenario is being run. E.g. we might do something like the following

@solver
def run_scenarios() -> Solver
  async def solve(state, generate):
    oversight_result, non_oversight_result = await fork(state, [
      run_scenario("oversight"),
      run_scenario("non_oversight")
    ])
  return solve

@solver
def run_scenario(name: str):
  async def solve(state, generate):
    ...
  return solve

and so we really want to see in the viewer whether we are looking at the "oversight" or "non_oversight" scenario.

But from your previous message and implementation it seems incorrect to think of these as inputs? Perhaps in this case it just makes more sense to use subtask directly rather than using fork? But then it's slightly inconvenient as I end up basically forking the state object manually.

jjallaire-aisi commented 2 days ago

Interesting, that does make sense! I think we could indeed probably capture the solver input params here. Will take a look later today!

jjallaire-aisi commented 2 days ago

Resolved with https://github.com/UKGovernmentBEIS/inspect_ai/pull/510