Closed rusheb closed 2 days ago
Yes, that would be awesome if you wanted to contribute this! A couple of things to note:
The SandboxEnvironment
class would need user
added to its exec()
instance method and then this would need to be added to the Docker and Local implementations (as well as the shim for the Podman environment in the test suite). Note they'll be a no-op for the local implementation because don't assume we have sudo privileges there.
In the Docker implementation it looks like docker compose exec
takes a --user
parameter. IDK if that is a user id or user name (or perhaps takes both?). Not sure which of these would be ideal?
cc @art-dsit @craigwalton-dsit for any additional thoughts here. It looks like docker compose cp
doesn't take a --user
argument so the read_file()
and write_file()
methods wouldn't be able to trivially support this as well (although perhaps w/ your pending work on the file methods they could?). @craigwalton-dsit Do you know if this is something we'd also be able to support for k8s?
I guess it behaves the same as
docker exec --user
, which allows either. But anyway since we are callingdocker compose exec
under the hood I think it seems fine to stick with whatever behaviour is given. \
Okay, if it takes both all the better!
Do you want tests for this? As far as I can tell, the sandbox envs aren't currently tested. If we add tests then we'd need to either ensure docker is installed on the CI runner or skip the test in CI (which I'm happy to set up either way).
It seems hard to instantiate the SandboxEnvironment directly, but I could do something like this:
Is there a better way that doesn't involve getting a whole task?
I imagine the first test could take ~10s due to pulling the image and then subsequent tests are about 2s each on my mac -- possibly slower on GH actions runners:
Possibly would be worth adding a runslow flag
Let me know what you think!
We have another PR incoming from @art-dsit that introduces a bunch of tests for sandboxes and has a way to shim them up outside of tasks. So I'd say wait for that PR to get posted (should be Mon/Tues) then add your tests accordingly. I'd be inclined to not run these tests on CI just to keep total test compute times reasonable (the --runslow
flag idea seems like a better idea than outright skipping though, as perhaps we might at some point want a scheduled action that does run all of the tests).
In Kubernetes, a pod has a security context (which includes which user any exec commands within any containers on the pod are run as) set in stone at pod definition time. So unfortunately, running different commands as a different users will require some sort of workaround like running the pod as root, then prepending a runuser -u <user> -- <command>
to the commands the agent wishes to run.
When it comes to multi-agent setups where you wish to conceal files etc., one option is to use multiple sandboxes (you just define multiple services in your compose.yaml
) and they can share volumes or talk over the network.
@craigwalton-dsit You are essentially saying that this could work in k8s but the default user would need to be root? That seems acceptable on the surface as the --user
parameter would in theory only be used in sandbox environments that are specifically crafted for it.
If however you are saying that --user
is extremely difficult to implement we might consider changing the contract from an explicit user
argument to exec
to an extra_args
parameter that is understood to be sandbox environment specific (so that people don't build tools assuming that every sandbox can implement user
for exec
. i.e. we want to be careful not to introduce things to the abstract method definitions that can only be narrowly realised by a small number of implementations.
@art-dsit Would be keen to get your thoughts as well (will re-open the issue in the case that we decide to make changes).
Yep, the default user would need to be root (or some user which has permission to execute runuser
for other users).
I think that we can do something like the following
def exec(command: list[str], user: str | None):
if user:
command = ["runuser", "-u", user, "--"] + command
# run command using k8s client
I tried the following very simple examples on the command line to prove this out.
$ kubectl exec -it agent-env-9wfypwub-default-0 -- runuser -u foo -- ls /home/bar
ls: cannot open directory '/home/bar': Permission denied
command terminated with exit code 2
$ kubectl exec -it agent-env-9wfypwub-default-0 -- runuser -u foo -- ls /home/foo
hi.txt
Not sure if there are limitations to runuser
I'm not aware of, or other issues we might run into, but in principle it looks doable.
We'd have to do something similar for the read_file()
and write_file()
commands, but I think this is doable as I'm using a tar
command with exec()
under the hood so can use the same runuser
trick.
I'm not sure --user
will pull its weight in terms of cost/benefit. If you can just add runuser
to exec, then a tool can also just do that, it doesn't need to be part of the sandbox implementation. I think it's coupling the sandbox to the tools and making it hard to implement a sandbox. For example, --user
isn't going to work for the ssh sandbox, nor probably the METR task bridge one.
We also need to think about what the relation between --user
and cwd
should be. If cwd
is relative (which is usually will be) then what would it be relative to, if user is specified? The existing docker implementation has _project.working_dir
which can be derived from the Dockerfile's working-dir, but if that happens to be /home/$DEFAULT_USER
, it doesn't make a lot of sense for that to persist when you change --user
.
This is now merged + test so closing out the issue.
When building evals, I'd like to exec in the sandbox as different users. This would enable multi-agent evals where an
agent
andoverseer
can hide files from each other, or evals where the task can hide files from the agent.I think a good solution here would be to add a
user
param toSandboxEnvironment.exec
.I'd be happy to contribute this myself if you agree that it makes sense.