UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://UKGovernmentBEIS.github.io/inspect_ai/
MIT License
380 stars 41 forks source link

Creating too many docker containers when running intercode-ctf example throws error #60

Closed kaifronsdal closed 1 day ago

kaifronsdal commented 1 week ago

When I try to run the intercode-ctf example, the list of available networks gets saturated very quickly and the eval crashes.

Error response from daemon: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network.

ip route shows

172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.19.0.0/16 dev br-1ee4565dd28b proto kernel scope link src 172.19.0.1 
172.20.0.0/16 dev br-7855a0b7cda3 proto kernel scope link src 172.20.0.1 
172.21.0.0/16 dev br-a02d57fa4370 proto kernel scope link src 172.21.0.1 
172.22.0.0/16 dev br-d3046edb8105 proto kernel scope link src 172.22.0.1 
172.23.0.0/16 dev br-98f968358000 proto kernel scope link src 172.23.0.1 
172.25.0.0/16 dev br-38f8a21c937f proto kernel scope link src 172.25.0.1 
172.26.0.0/16 dev br-8f20f51e8c38 proto kernel scope link src 172.26.0.1 
172.27.0.0/16 dev br-a199316d2e47 proto kernel scope link src 172.27.0.1 
172.28.0.0/16 dev br-7c8b0bcc509d proto kernel scope link src 172.28.0.1 
172.29.0.0/16 dev br-2c4f6625aa3d proto kernel scope link src 172.29.0.1 
172.30.0.0/16 dev br-8ec4077c5d86 proto kernel scope link src 172.30.0.1 
172.31.0.0/16 dev br-1d0f67e51d69 proto kernel scope link src 172.31.0.1 
192.168.0.0/20 dev br-3cdf0de4e823 proto kernel scope link src 192.168.0.1 
192.168.16.0/20 dev br-b9d7325a20bc proto kernel scope link src 192.168.16.1 
192.168.32.0/20 dev br-c21b6a459314 proto kernel scope link src 192.168.32.1 
192.168.48.0/20 dev br-f1ac22d32b23 proto kernel scope link src 192.168.48.1 
192.168.64.0/20 dev br-460b7562ab86 proto kernel scope link src 192.168.64.1 
192.168.80.0/20 dev br-0ce4e2ec1a6c proto kernel scope link src 192.168.80.1 
192.168.96.0/20 dev br-773c4ae36a8d proto kernel scope link src 192.168.96.1 
192.168.112.0/20 dev br-8851d95be861 proto kernel scope link src 192.168.112.1 
192.168.128.0/20 dev br-a64a1dbcf694 proto kernel scope link src 192.168.128.1 
192.168.144.0/20 dev br-a7a134858c40 proto kernel scope link src 192.168.144.1 
192.168.160.0/20 dev br-b0626ef3ac6a proto kernel scope link src 192.168.160.1 
192.168.176.0/20 dev br-e790188f5c7a proto kernel scope link src 192.168.176.1 
192.168.192.0/20 dev br-efb48b949aa4 proto kernel scope link src 192.168.192.1 
192.168.208.0/20 dev br-dcff0c4a0d17 proto kernel scope link src 192.168.208.1 
192.168.224.0/20 dev br-636dad1fcc1d proto kernel scope link src 192.168.224.1 
192.168.240.0/20 dev br-749bce2bd1d5 proto kernel scope link src 192.168.240.1 

There are two potentiall fixes I've found. First is to assign each docker container to the same network rather than the default of creating a new network each time. To do this create acompose.yaml file in the same directory as the Dockerfile:

services:
  default:
    build: .
    command: tail -f /dev/null
    cpus: 1.0
    mem_limit: 0.5gb
    networks:
      - inspect_network

networks:
  inspect_network:
    driver: bridge
    ipam:
      config:
          - subnet: 172.20.0.0/16

or you can turn off the network entirely for each container like so

services:
  default:
    build: .
    command: tail -f /dev/null
    cpus: 1.0
    mem_limit: 0.5gb
    network_mode: none

(As an aside, I would recomend using docker-py rather than executing bash commands via subprocess to make the container code more maintainable)

jjallaire commented 6 days ago

Thanks for reporting this!

Do you know what your max_connections is? (that will determine the default for max_samples, which in turn determines how many Docker containers are created. I've seen 20 containers work fine but that probably varies by system. Note also that there is a cleanup phase, which if interrupted (e.g. by a Ctrl+C) can leave containers and networks orphaned. You can clean these up manually with: https://ukgovernmentbeis.github.io/inspect_ai/agents.html#environment-cleanup

In the most recent version of Inspect (just published a couple of days ago) we disable the Internet by default in our auto-generated compose.yaml

services:
  default:
    ...
    networks:
      - no-internet
networks:
  no-internet:
    driver: bridge
    internal: true

It strikes me that your network_mode: none might be a more elegant way to do this. @craigwalton-dsit do you have an opinion about this?

craigwalton-dsit commented 6 days ago

Thanks a lot for your input!

I think network_mode: none looks much better! I'm happy to create a PR if you'd like @jjallaire?

On the docker-py / docker SDK for Python front, afaik it does not support the equivalent of docker compose commands. So you could do some some subset of commands like exec and cp if you knew the container names, but seeing as we need to use subprocess for up and down etc., it seemed sensible to use that consistently. Let me know if you know otherwise.

jjallaire commented 5 days ago

Yes, @craigwalton-dsit a PR for that would be awesome!

kaifronsdal commented 5 days ago

Do you know what your max_connections is? (that will determine the default for max_samples, which in turn determines how many Docker containers are created. I've seen 20 containers work fine but that probably varies by system.

Currently its at 32, but I might go higher. I've written a vLLM backend to speed up local models and the bottleneck became the amount of parallelization rather than the model generation.

On the docker-py / docker SDK for Python front, afaik it does not support the equivalent of docker compose commands. So you could do some some subset of commands like exec and cp if you knew the container names, but seeing as we need to use subprocess for up and down etc., it seemed sensible to use that consistently. Let me know if you know otherwise.

Didn't realize that there was no compose commands, I've only used it for very simple things before. Probably subprocesses are the way to go. A quick search found python-on-whales, but might not be worth the effort to use a 3rd party package/if it aint broke don't fix it.

jjallaire commented 4 days ago

Okay, I noticed in the recently published Docker implementation for SWE Bench that they recommend going no higher than 28 parallel Docker containers (no matter how many cores you have, likely for reasons you are seeing here!). The thing you might consider doing is setting max_samples a bit lower if you run into problems with Docker.

We've also made the network mode change you suggested which may help (that commit is on main but not yet on PyPI).

jjallaire commented 1 day ago

So we've got two changes that should address this:

1) AS mentioned before, we now use network_mode: none as per your example

2) When using Docker, the default max_samples is capped at 25 (i.e. you can still set it higher than that, but if you haven't specified then we won't let it get higher than that based on max_connections. This change is is here: https://github.com/UKGovernmentBEIS/inspect_ai/commit/2d7561f02213781d603ed4ff1826f8dfe0963da4

kaifronsdal commented 1 day ago

I like this solution! I've currently settled on this approach where I create a network manually and then assign all containers to it. The reason is that some of the setup code in the CTF example requires downloading files from the internet (i.e. calls to wget) even if the model will never need to access the internet.

services:
  default:
    build: .
    command: tail -f /dev/null
    cpus: 1.0
    mem_limit: 0.5gb
    networks:
      - inspect_network

# if network does not exist, create it
# docker network create inspect_network --subnet 172.20.0.0/16
networks:
  inspect_network:
    external: true