I/O Delay since ever on Apple M1 when using many Containers

ghost commented 3 years ago

Hey Folks,

can anybody imagen where the following behaviour comes from: When I execute my whole stack (Around 8 Continers, including Redis, Elasticsearch, MariaDB (not MySQL), Python) I always run into some kind of IO delay it seems. The Projekt comes up fine but when I try to exec some functions of my Programm I very often have to wait 3-5 sec before I get any response, even if the function is at a very low complexitiy like a simple filter statement.

Since the M1 Version has been updated to 3.3.1 I also see some Improvements on the RAM management. Docker does less swapping it seems. Thanks for that fix! But sadly that hasen't solved my issue.

I tried both virtualization types Docker Desktop offers under the experimental Tab but with the same result :(

Can smb. Help? I'm running a 16GiB mac mini - M1

Kind regards and Thanks in advance

SwenVanZanten commented 3 years ago

Hi I have the same issue, running 3.3.3 over here on a MacBook Air M1. When I do a curl request from one container to another. It always happens to be ~5 seconds and happens randomly.

docker-compose exec php curl http://statics:4000/manifest.json -w %{time_connect}:%{time_starttransfer}:%{time_total}
{
  "_hash": "v1619773217126",
  "js/activate-js.js": "js/activate-js.js",
  "js/m.js": "js/m.js"
} 5.118593:5.118996:5.119123

docker-compose exec php curl http://statics:4000/manifest.json -w %{time_connect}:%{time_starttransfer}:%{time_total}
{
  "_hash": "v1619773217126",
  "js/activate-js.js": "js/activate-js.js",
  "js/m.js": "js/m.js"
} 0.004253:0.004689:0.004720

SwenVanZanten commented 3 years ago

Okay looks like it is caused by DNS resolving. After adding dedicated IP's and add them to my images host files it was resolved and I get a quick respons every request.

ruudk commented 3 years ago

I notice the same. Connecting to Redis from PHP sometimes takes a few seconds because resolving redis hostname is slow. When I change the hostname to the ip of the container, it's fast again.

docker-robott commented 3 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

SwenVanZanten commented 3 years ago

/remove-lifecycle stale

markshust commented 3 years ago

@SwenVanZanten what are the exact steps you did to resolve this issue? I'm noticing the same thing happening on the new M1X chip.

markshust commented 3 years ago

Never mind, figured it out. This is definitely a confirmed issue with Docker for Mac on Apple Silicone.

❯ docker-compose exec phpfpm curl http://elasticsearch:9200/manifest.json -w %{time_connect}:%{time_starttransfer}:%{time_total}

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [manifest.json]","resource.type":"index_or_alias","resource.id":"manifest.json","index_uuid":"_na_","index":"manifest.json"}],"type":"index_not_found_exception","reason":"no such index [manifest.json]","resource.type":"index_or_alias","resource.id":"manifest.json","index_uuid":"_na_","index":"manifest.json"},"status":404}5.101758:5.105394:5.105483%                                                                                      

❯ docker-compose exec phpfpm curl http://172.17.0.1:9200/manifest.json -w %{time_connect}:%{time_starttransfer}:%{time_total}

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [manifest.json]","resource.type":"index_or_alias","resource.id":"manifest.json","index_uuid":"_na_","index":"manifest.json"}],"type":"index_not_found_exception","reason":"no such index [manifest.json]","resource.type":"index_or_alias","resource.id":"manifest.json","index_uuid":"_na_","index":"manifest.json"},"status":404}0.000886:0.008343:0.008373%

markshust commented 3 years ago

A potential workaround, instead of using the ip, is to use:

host.docker.internal

The services defined in the docker-compose.yml file don't seem to be resolving to this internal hostname.

markshust commented 3 years ago

I can also confirm that the host.docker.internal hostname also fails lookups randomly 😅

The only way to get rid of the lag is to add an entry to the container's /etc/hosts file:

172.17.0.1 host.docker.internal

which can also be done with:

  phpfpm:
    ...
    extra_hosts:
      - "host.docker.internal:172.17.0.1"

Then use host.docker.internal everywhere rather than the service name.

jayfk commented 2 years ago

Maybe I'm missing something, but how is hardcoding an IP address to host.docker.internal helping to resolve other services? Do you do this for all your services?

markshust commented 2 years ago

@jayfk I actually went with the names of the services instead. But, this will just force-resolve the DNS for that name to the ip of Docker, which definitely fixes this I/O issue:

https://github.com/markshust/docker-magento/blob/master/compose/docker-compose.yml#L28-L37

SwenVanZanten commented 2 years ago

I solved this by doing the following:

docker-compose.yml

version: "3.8"

x-extra_hosts: &extra_hosts
    extra_hosts:
        - "service-a:172.20.0.4"
        - "service-b:172.20.0.5"
        - "service-c:172.20.0.6"

services:
    service-a:
        <<: *extra_hosts
        networks:
            default:
                ipv4_address: 172.20.0.4

    service-b:
        <<: *extra_hosts
        networks:
            default:
                ipv4_address: 172.20.0.5

    service-c:
        <<: *extra_hosts
        networks:
            default:
                ipv4_address: 172.20.0.6

networks:
    default:
        ipam:
            driver: default
            config:
                - subnet: "172.20.0.0/24"

Define a network so you can fix the subnet. Assign IP's to the different services. And define a hosts list where every service knows which service is which IP then assign that list to every service.

pinktig commented 2 years ago

Thanks for this fix, @SwenVanZanten. I've been having exact same issue with 16" MacBook Pro with M1 Pro.

In my case, I was running Django server on docker-compose and response from Django server was delayed approximately 5 seconds for no apparent reason. I wasted quite some time until I figured out that it wasn't the issue with my codebase.

I've applied your fix and now it works fine. In my case I had to unquote subnet setting as follow:

networks:
    default:
        ipam:
            driver: default
            config:
                - subnet: 172.20.0.0/24 # unquote IP address

SwenVanZanten commented 2 years ago

Thanks for this fix, @SwenVanZanten. I've been having exact same issue with 16" MacBook Pro with M1 Pro.

In my case, I was running Django server on docker-compose and response from Django server was delayed approximately 5 seconds for no apparent reason. I wasted quite some time until I figured out that it was the issue with my codebase.

I've applied your fix and now it works fine. In my case I had to unquote subnet setting as follow:
networks:

    default:

        ipam:

            driver: default

            config:

                - subnet: 172.20.0.0/24 # unquote IP address

Awesome! Now let's hope this issue finally gets fixed so this work around isn't needed

bweinzierl commented 2 years ago

Ran into the same issue, took me quite some time to figure it out. Hope this will get fixed soon. The described workaround worked for me too...

pinktig commented 2 years ago

Hi, @SwenVanZanten, just in case if you're still following this issue, I found another solution that worked for me. @annemarietannengrund suggested this in #5548, and now it seems to be working normally so far. If your base image is buster(-slim), upgrading to bullseye(-slim) might solve this issue.

At least it worked for me!

SwenVanZanten commented 2 years ago

Hi, @SwenVanZanten, just in case if you're still following this issue, I found another solution that worked for me. @annemarietannengrund suggested this in #5548, and now it seems to be working normally so far. If your base image is buster(-slim), upgrading to bullseye(-slim) might solve this issue.

At least it worked for me!

Yup that solved the problem #:+1:

docker-robott commented 2 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

jtraulle commented 2 years ago

/remove-lifecycle stale

folkvir commented 2 years ago

Any news on this? I still have to use fixed ip addresses in my configurations otherwise I always get a 5 seconds latency between my containers. Something like that is not convenient at all to use:

version: "3.7"

networks:
  default:
    ipam:
      config:
        - subnet: "172.25.0.0/24"
          gateway: "172.25.0.1"

x-extra_hosts: &extra_hosts
    extra_hosts:
        - "mysql:172.25.0.2"

services:
  server:
     depends_on:
        - mysql
     <<: *extra_hosts

  mysql:
    <<: *extra_hosts
    networks:
      default:
        ipv4_address: 172.25.0.2

My setup is working correctly on older Macbook without M1 chips without the ip addressing. Is it related to my containers? Or related to internal docker name resolving with M1?

bweinzierl commented 2 years ago

In our scenario upgrading to bullseye(-slim) solves the issue. But there are reasons why we cannot do that (yet). We also cannot use a static configuratiuon because we work in a team. So yes, i would really love a solution as well.

As a solution until then i have written this little bash script to put the dynamic ip addresses in the /etc/hosts of the container. This also solves the latency problem:

case $( uname -m ) in
arm64)
    while read -r CONTAINER_NAME || [ -n "${CONTAINER_NAME}" ]; do
        echo "In container: $CONTAINER_NAME"

        case $( docker exec -t -u root "$CONTAINER_NAME" /bin/sh -l -c "if grep -Fxq '#ARM-NETWORKING-FIX:' /etc/hosts; then echo 'setup-already-done'; fi" | tr -d '\r' ) in
            setup-already-done)
                echo "Setup already done. Skipping..."
                ;;
            *)
                echo "Setting up /etc/hosts file..."
                docker exec -t -u root "$CONTAINER_NAME" /bin/sh -l -c "echo '#ARM-NETWORKING-FIX:' >> /etc/hosts"
                while read -r LINE || [ -n "${LINE}" ]; do
                    echo "adding line to /etc/hosts: $LINE"
                    docker exec -t -u root "$CONTAINER_NAME" /bin/sh -l -c "echo ${LINE} >> /etc/hosts"
                done < <(docker-compose ps -q | xargs -n 1 docker inspect --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}} {{index  .Config.Labels "com.docker.compose.service" }}' | sed 's/ \// /')
                ;;
        esac
    done < <(docker-compose ps -q | xargs -n 1 docker inspect --format '{{ .Name }}' | sed 's/\// /')
    ;;
*)
    echo "Not on ARM architecture. Not needed!"
    ;;
esac

tehmaestro commented 2 years ago

It is still slow for me. I tried both the ipv4_address version and the script above (thanks for the script, pretty cool). Still, nothing seems to be fixing it.

dgastudio commented 2 years ago

finally found a solution

/etc/hosts

127.0.0.1 localhost 255.255.255.255 broadcasthost

::1 localhost mysite.local // all your existing hosts

works like a charm, no more delays

msert29 commented 1 year ago

I've been hitting the same issue with a Django based compose service running alongside celery and NGINX proxy. I've first tried converting the images to use ARM based images but still had the same issue. Others suggested using bulls-eye images again with no luck. Following @dgastudio solution above, I've managed to get it working by simply appending the ::1 localhost mysite in my /etc/hosts file

docker-robott commented 1 year ago

There hasn't been any activity on this issue for a long time. If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment. If not, this issue will be closed in 30 days.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

docker-robot[bot] commented 1 year ago

Closed issues are locked after 30 days of inactivity. This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

/lifecycle locked

docker / for-mac

I/O Delay since ever on Apple M1 when using many Containers #5626