Closed kaifronsdal closed 1 day ago
Thanks for reporting this!
Do you know what your max_connections
is? (that will determine the default for max_samples
, which in turn determines how many Docker containers are created. I've seen 20 containers work fine but that probably varies by system. Note also that there is a cleanup phase, which if interrupted (e.g. by a Ctrl+C) can leave containers and networks orphaned. You can clean these up manually with: https://ukgovernmentbeis.github.io/inspect_ai/agents.html#environment-cleanup
In the most recent version of Inspect (just published a couple of days ago) we disable the Internet by default in our auto-generated compose.yaml
services:
default:
...
networks:
- no-internet
networks:
no-internet:
driver: bridge
internal: true
It strikes me that your network_mode: none
might be a more elegant way to do this. @craigwalton-dsit do you have an opinion about this?
Thanks a lot for your input!
I think network_mode: none
looks much better! I'm happy to create a PR if you'd like @jjallaire?
On the docker-py / docker SDK for Python front, afaik it does not support the equivalent of docker compose
commands. So you could do some some subset of commands like exec
and cp
if you knew the container names, but seeing as we need to use subprocess
for up
and down
etc., it seemed sensible to use that consistently. Let me know if you know otherwise.
Yes, @craigwalton-dsit a PR for that would be awesome!
Do you know what your
max_connections
is? (that will determine the default formax_samples
, which in turn determines how many Docker containers are created. I've seen 20 containers work fine but that probably varies by system.
Currently its at 32, but I might go higher. I've written a vLLM backend to speed up local models and the bottleneck became the amount of parallelization rather than the model generation.
On the docker-py / docker SDK for Python front, afaik it does not support the equivalent of docker compose commands. So you could do some some subset of commands like exec and cp if you knew the container names, but seeing as we need to use subprocess for up and down etc., it seemed sensible to use that consistently. Let me know if you know otherwise.
Didn't realize that there was no compose commands, I've only used it for very simple things before. Probably subprocesses are the way to go. A quick search found python-on-whales, but might not be worth the effort to use a 3rd party package/if it aint broke don't fix it.
Okay, I noticed in the recently published Docker implementation for SWE Bench that they recommend going no higher than 28 parallel Docker containers (no matter how many cores you have, likely for reasons you are seeing here!). The thing you might consider doing is setting max_samples
a bit lower if you run into problems with Docker.
We've also made the network mode change you suggested which may help (that commit is on main
but not yet on PyPI).
So we've got two changes that should address this:
1) AS mentioned before, we now use network_mode: none
as per your example
2) When using Docker, the default max_samples
is capped at 25 (i.e. you can still set it higher than that, but if you haven't specified then we won't let it get higher than that based on max_connections
. This change is is here: https://github.com/UKGovernmentBEIS/inspect_ai/commit/2d7561f02213781d603ed4ff1826f8dfe0963da4
I like this solution! I've currently settled on this approach where I create a network manually and then assign all containers to it. The reason is that some of the setup code in the CTF example requires downloading files from the internet (i.e. calls to wget
) even if the model will never need to access the internet.
services:
default:
build: .
command: tail -f /dev/null
cpus: 1.0
mem_limit: 0.5gb
networks:
- inspect_network
# if network does not exist, create it
# docker network create inspect_network --subnet 172.20.0.0/16
networks:
inspect_network:
external: true
When I try to run the intercode-ctf example, the list of available networks gets saturated very quickly and the eval crashes.
ip route
showsThere are two potentiall fixes I've found. First is to assign each docker container to the same network rather than the default of creating a new network each time. To do this create a
compose.yaml
file in the same directory as theDockerfile
:or you can turn off the network entirely for each container like so
(As an aside, I would recomend using docker-py rather than executing bash commands via subprocess to make the container code more maintainable)