docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.44k stars 119 forks source link

After a short time, Docker starts returning internal server errors and becomes unusable #7288

Open jrnorth opened 6 months ago

jrnorth commented 6 months ago

Description

Around ten minutes or so after starting Docker Desktop and several containers, docker commands will start returning messages like the following: request returned Internal Server Error for API route and version http://%2FUsers%2Fjoe%2F.docker%2Frun%2Fdocker.sock/v1.45/containers/json, check if the server supports the requested API version

Likewise, the Docker Desktop application is unable to load any data on any of the tabs.

Either restarting Docker Desktop or quitting and launching it again will resolve the issue, but only for a short time before it happens again.

Reproduce

  1. Start several containers
  2. Wait an unspecified amount of time (at least ten minutes or so), then try to run a docker command
  3. It should hang for a while then fail with the error in the description

Expected behavior

Docker Desktop and the docker commands should continue to work as expected.

docker version

Client:
 Cloud integration: v1.0.35+desktop.13
 Version:           26.1.1
 API version:       1.45
 Go version:        go1.21.9
 Git commit:        4cf5afa
 Built:             Tue Apr 30 11:44:56 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.30.0 (149282)
 Engine:
  Version:          26.1.1
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       ac2de55
  Built:            Tue Apr 30 11:48:04 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    26.1.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     /Users/joe/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     /Users/joe/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     /Users/joe/.docker/cli-plugins/docker-debug
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/joe/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     /Users/joe/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     /Users/joe/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     /Users/joe/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/joe/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     /Users/joe/.docker/cli-plugins/docker-scout

Server:
 Containers: 17
  Running: 17
  Paused: 0
  Stopped: 0
 Images: 108
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.6.26-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 10
 Total Memory: 31.3GiB
 Name: docker-desktop
 ID: e4963b97-992b-43d4-a832-7ce9c03d69f7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/joe/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

Diagnostics ID

4BA00B44-3ABD-4F0A-B1AD-081C38AB5075/20240521011143

Additional Info

I'm on Ventura 13.6.7. I was on 13.6.6 last week with the same Docker Desktop version and did not have this issue.

jeremy-london commented 6 months ago

Experiencing the same issue -- I am downgrading back to 4.29.0 for the time being and will monitor this issue for patches

docker ps -a

request returned Internal Server Error for API route and version 
http://%2FUsers%2Fjeremylondon%2F.docker%2Frun%2Fdocker.sock/v1.45/containers/json?all=1, check if the server supports the requested API version

Running v4.30.0 on Sonoma 14.5 with a Apple M3 Pro. I noticed this happen twice now after around 1-3 hours of a container service running, then it freezes and docker desktop becomes unresponsive. No logs, no exec.

docker version
Client:
 Cloud integration: v1.0.35+desktop.13
 Version:           26.1.1
 API version:       1.45
 Go version:        go1.21.9
 Git commit:        4cf5afa
 Built:             Tue Apr 30 11:44:56 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.30.0 (149282)
 Engine:
  Version:          26.1.1
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       ac2de55
  Built:            Tue Apr 30 11:48:04 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
docker info
Client:
 Version:    26.1.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-debug
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     /Users/jeremylondon/.docker/cli-plugins/docker-scout

Server:
 Containers: 2
  Running: 1
  Paused: 0
  Stopped: 1
 Images: 26
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.6.26-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 11
 Total Memory: 7.754GiB
 Name: docker-desktop
 ID: 82224e3c-a63c-4296-a131-9d9f9dc914db
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/jeremylondon/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

Diagnostic ID 5B33B463-4C86-4594-87CA-F627D88F8EE5/20240529135858

jeremy-london commented 6 months ago

Rats downgraded to v4.29.0 and same error - Docker ran fine for just under 4 hours and then crashed again and process hung

New Diagnostic ID: 5B33B463-4C86-4594-87CA-F627D88F8EE5/20240529235916

jpbriend commented 6 months ago

@jeremy-london do you have a docker-compose (which you can share) which could help us reproduce the issue?

jeremy-london commented 6 months ago

@jpbriend Sure! -- In an effort to try something new I reduced the resource limits to only use 6 out of 11 cpu cores, and tweak my python code to use 5 max workers -- previously i was using 11 cpu cores in the resource limit (max of my machine), and my python script was running 15 max worker threads.

This worked for around 15 hours before crashing like the previous examples.

New Diagnostic ID: 5B33B463-4C86-4594-87CA-F627D88F8EE5/20240530155458

Here is a sample project that simulates what i've got running (Selenium Grid w/ node-docker)

README for Selenium Grid

The goal was to use the node-docker to dynamically create a chromedriver image, then with the python script be able to process multiple URLs at a time (as the core selenium drivers are not thread safe.. this moves the problem to the docker runtime and allows 1 driver per process).. which works great! but sometimes crashes docker desktop...

NOTE: you can replace seleniarm/ with selenium in the compose and config files - if you are x86 linux/amd64. I am on Mac Silicon M3 Pro so using arm based containers linux/arm64

docker-compose.yml

name: web-scraper-grid
services:
  selenium-hub:
    image: seleniarm/hub:4.20
    container_name: selenium-hub
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"

  node-docker:
    image: seleniarm/node-docker:4.20
    container_name: node-docker
    shm_size: '2gb'
    volumes:
      # - ./assets:/opt/selenium/assets # Uncomment if you want to use assets to track sessionCapabilities.json
      - ./config.toml:/opt/bin/config.toml
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      - selenium-hub
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_START_XVFB=false
      - SE_START_VNC=false

config.toml

[docker]
configs = [
  "seleniarm/standalone-chromium:124.0", '{"browserName": "chrome", "browserVersion": "124.0"}'
]

# URL for connecting to the docker daemon
url = "http://127.0.0.1:2375"

# Assets path (optional mount)
assets-path = "/opt/selenium/assets"

example test.py

# pip install selenium

import logging
import time
import gc
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

def setup_selenium_driver(command_executor: str = "http://localhost:4444"):
    options = webdriver.ChromeOptions()
    options.add_argument("--headless=new")
    return webdriver.Remote(command_executor=command_executor, options=options)

def analyze_single_url(driver: webdriver.Remote, url: str, index: int, start_times: List[float], end_times: List[float]):
    start_times[index] = time.time()
    try:
        driver.get(url)

        # set a delay to simulate processing time
        time.sleep(4)

        end_times[index] = time.time()
        logging.info(f"Item {index+1}: {driver.title}")
    except Exception as e:
        end_times[index] = time.time()
        logging.error(f"Error processing URL {index + 1}: {url} - {e}")
    finally:
        driver.quit()
        gc.collect()

def analyze_urls(urls: List[str], max_workers: int):
    logging.info(f"Analyzing {len(urls)} URLs...")
    start_times, end_times = [0] * len(urls), [0] * len(urls)
    start_time = time.time()
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(analyze_single_url, setup_selenium_driver(), url, idx, start_times, end_times) for idx, url in enumerate(urls)]
        for future in as_completed(futures):
            try:
                future.result()
            except Exception as e:
                logging.error(f"Error occurred: {e}", exc_info=True)
            finally:
                gc.collect()
    total_time = time.time() - start_time
    logging.info(f"Total Time: {total_time:.2f} seconds for {len(urls)} URLs.")
    logging.info(f"Processing rate: {len(urls) / total_time:.2f} URLs per second.")

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

    max_workers = 5
    test_urls = ["https://dbad-license.org/"] * 100000

    analyze_urls(test_urls, max_workers)
nothing2obvi commented 6 months ago

I have the same problem on an Intel Mac Mini 2018 with Sonoma 14.5 with Docker Desktop 4.30.0.

I don't have the problem on an M1 Mac Mini with Sonoma 14.5 with Docker Desktop 4.30.0. In fact it runs pretty flawlessly.

nothing2obvi commented 6 months ago

I commented this earlier and deleted it, but can now confirm that downgrading to 4.24.0 makes Docker Desktop last a few hours longer before it all becomes unresponsive. Obviously not a fix.

nothing2obvi commented 5 months ago

This problem seems to still exist for 4.31.0.

nothing2obvi commented 5 months ago

@jeremy-london @jrnorth

Have any of you found a workaround to this? I've tried using Rancher Desktop, Colima, and OrbStack but they all introduce their own set of problems.

Sdedeugd commented 4 months ago

I also have this exact issue. Tried Podman as substitute, this has the same issue. Podman UI keeps working, but the containers break down.

It looks like changing from VirtioFS to gRPC FUSE in Docker Desktop settings made it somewhat more stable, but after a day Docker Desktops becomes unresponsive and I’m unable to reach my containers. It seems to pop up when more CPU intensive tasks run on the container and the system clogs. I’m currently auto-restarting Docker Desktop early in the morning to see whether that works.

dionjwa commented 4 months ago

Also seeing this in 4.29.0 and 4.31.0

bdeo commented 1 month ago

Me and my team are having this problem nearly daily on Apple Silicon macs ranging from 2021 Apple M1 Pro Macbook Pro, to 2024 M3 Max. This seems like the same error as https://github.com/docker/for-mac/issues/6956 and https://github.com/docker/for-mac/issues/7240 and https://github.com/docker/for-mac/issues/6933

We've tried disabling resource saver as described in other issues, but this is becoming worse over time I think.