UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
540 stars 82 forks source link

[feature request] run commands in SandboxEnvironment as different users #272

Closed rusheb closed 2 days ago

rusheb commented 3 weeks ago

When building evals, I'd like to exec in the sandbox as different users. This would enable multi-agent evals where an agent and overseer can hide files from each other, or evals where the task can hide files from the agent.

I think a good solution here would be to add a user param to SandboxEnvironment.exec.

I'd be happy to contribute this myself if you agree that it makes sense.

jjallaire-aisi commented 2 weeks ago

Yes, that would be awesome if you wanted to contribute this! A couple of things to note:

  1. The SandboxEnvironment class would need user added to its exec() instance method and then this would need to be added to the Docker and Local implementations (as well as the shim for the Podman environment in the test suite). Note they'll be a no-op for the local implementation because don't assume we have sudo privileges there.

  2. In the Docker implementation it looks like docker compose exec takes a --user parameter. IDK if that is a user id or user name (or perhaps takes both?). Not sure which of these would be ideal?

cc @art-dsit @craigwalton-dsit for any additional thoughts here. It looks like docker compose cp doesn't take a --user argument so the read_file() and write_file() methods wouldn't be able to trivially support this as well (although perhaps w/ your pending work on the file methods they could?). @craigwalton-dsit Do you know if this is something we'd also be able to support for k8s?

rusheb commented 2 weeks ago

I guess it behaves the same as docker exec --user, which allows either. But anyway since we are calling docker compose exec under the hood I think it seems fine to stick with whatever behaviour is given. \

Okay, if it takes both all the better!

rusheb commented 2 weeks ago

Do you want tests for this? As far as I can tell, the sandbox envs aren't currently tested. If we add tests then we'd need to either ensure docker is installed on the CI runner or skip the test in CI (which I'm happy to set up either way).

It seems hard to instantiate the SandboxEnvironment directly, but I could do something like this:

test_docker_exec ```python import textwrap from typing import Callable, NamedTuple from inspect_ai import Task, eval, task from inspect_ai.dataset import Sample from inspect_ai.solver import Generate, Solver, TaskState, solver from inspect_ai.util import sandbox class CommandAndUser(NamedTuple): command: list[str] user: str | None = None # edit: I now realise the outer function isn't needed. Will fix later def create_exec_solver(command_user_pairs: list[CommandAndUser]) -> Callable[..., Solver]: @solver def exec_solver() -> Solver: async def solve(state: TaskState, generate: Generate) -> TaskState: results: list[str] = [] for command, user in command_user_pairs: result = await sandbox().exec(command, user=user) results.append(result.stdout.strip()) state.output.completion = "\n".join(results) return state return solve return exec_solver def create_task(solver: Solver, setup: str | None = None) -> Callable[..., Task]: @task def custom_task() -> Task: return Task( dataset=[Sample(input="irrelevant", setup=setup)], plan=[solver], sandbox="docker", ) return custom_task def run_eval(task: Callable[..., Task]) -> str: result = eval(task, model="mockllm/model")[0] assert result.samples is not None content = result.samples[0].output.choices[0].message.content assert isinstance(content, str) return content def test_docker_sandbox_env() -> None: echo_solver = create_exec_solver( [CommandAndUser(["echo", "Hello, World!"], None)], ) echo_task = create_task(echo_solver()) message = run_eval(echo_task) assert message == "Hello, World!" def test_docker_sandbox_users() -> None: whoami_solver = create_exec_solver( [ CommandAndUser(["whoami"], "root"), CommandAndUser(["whoami"], "myuser"), ] ) add_user_script = textwrap.dedent( """#!/bin/bash useradd -m myuser""" ) whoami_task = create_task(whoami_solver(), setup=add_user_script) message = run_eval(whoami_task) expected_output = "root\nmyuser" assert message == expected_output, f"Expected '{expected_output}', got '{message}'" def test_docker_sandbox_nonexistent_user() -> None: nonexistent_solver = create_exec_solver( [ CommandAndUser(["whoami"], "nonexistent"), ] ) nonexistent_task = create_task(nonexistent_solver()) message = run_eval(nonexistent_task) expected_error = ( "unable to find user nonexistent: no matching entries in passwd file" ) assert expected_error in message ```

Is there a better way that doesn't involve getting a whole task?

I imagine the first test could take ~10s due to pulling the image and then subsequent tests are about 2s each on my mac -- possibly slower on GH actions runners:

durations ```python ======================================================================================== slowest durations ========================================================================================= 2.12s call tests/util/sandbox/test_docker.py::test_docker_sandbox_users 2.01s call tests/util/sandbox/test_docker.py::test_docker_sandbox_env 1.74s call tests/util/sandbox/test_docker.py::test_docker_sandbox_nonexistent_user ```

Possibly would be worth adding a runslow flag

Let me know what you think!

jjallaire-aisi commented 2 weeks ago

We have another PR incoming from @art-dsit that introduces a bunch of tests for sandboxes and has a way to shim them up outside of tasks. So I'd say wait for that PR to get posted (should be Mon/Tues) then add your tests accordingly. I'd be inclined to not run these tests on CI just to keep total test compute times reasonable (the --runslow flag idea seems like a better idea than outright skipping though, as perhaps we might at some point want a scheduled action that does run all of the tests).

craigwalton-dsit commented 2 weeks ago

In Kubernetes, a pod has a security context (which includes which user any exec commands within any containers on the pod are run as) set in stone at pod definition time. So unfortunately, running different commands as a different users will require some sort of workaround like running the pod as root, then prepending a runuser -u <user> -- <command> to the commands the agent wishes to run.

When it comes to multi-agent setups where you wish to conceal files etc., one option is to use multiple sandboxes (you just define multiple services in your compose.yaml) and they can share volumes or talk over the network.

jjallaire-aisi commented 2 weeks ago

@craigwalton-dsit You are essentially saying that this could work in k8s but the default user would need to be root? That seems acceptable on the surface as the --user parameter would in theory only be used in sandbox environments that are specifically crafted for it.

If however you are saying that --user is extremely difficult to implement we might consider changing the contract from an explicit user argument to exec to an extra_args parameter that is understood to be sandbox environment specific (so that people don't build tools assuming that every sandbox can implement user for exec. i.e. we want to be careful not to introduce things to the abstract method definitions that can only be narrowly realised by a small number of implementations.

@art-dsit Would be keen to get your thoughts as well (will re-open the issue in the case that we decide to make changes).

craigwalton-dsit commented 2 weeks ago

Yep, the default user would need to be root (or some user which has permission to execute runuser for other users).

I think that we can do something like the following

def exec(command: list[str], user: str | None):
    if user:
        command = ["runuser", "-u", user, "--"] + command
    # run command using k8s client

I tried the following very simple examples on the command line to prove this out.

$ kubectl exec -it agent-env-9wfypwub-default-0 -- runuser -u foo -- ls /home/bar
ls: cannot open directory '/home/bar': Permission denied
command terminated with exit code 2
$ kubectl exec -it agent-env-9wfypwub-default-0 -- runuser -u foo -- ls /home/foo
hi.txt

Not sure if there are limitations to runuser I'm not aware of, or other issues we might run into, but in principle it looks doable.

We'd have to do something similar for the read_file() and write_file() commands, but I think this is doable as I'm using a tar command with exec() under the hood so can use the same runuser trick.

art-dsit commented 2 weeks ago

I'm not sure --user will pull its weight in terms of cost/benefit. If you can just add runuser to exec, then a tool can also just do that, it doesn't need to be part of the sandbox implementation. I think it's coupling the sandbox to the tools and making it hard to implement a sandbox. For example, --user isn't going to work for the ssh sandbox, nor probably the METR task bridge one.

art-dsit commented 2 weeks ago

We also need to think about what the relation between --user and cwd should be. If cwd is relative (which is usually will be) then what would it be relative to, if user is specified? The existing docker implementation has _project.working_dir which can be derived from the Dockerfile's working-dir, but if that happens to be /home/$DEFAULT_USER, it doesn't make a lot of sense for that to persist when you change --user.

jjallaire commented 2 days ago

This is now merged + test so closing out the issue.