docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
34.01k stars 5.23k forks source link

[BUG] `docker compose` creates zombie layers not cleaned by `docker system prune` #10481

Closed qianguih closed 5 days ago

qianguih commented 1 year ago

Description

I observed /var/lib/docker/overlay2 was slowly eating all disk space on my machine. So I dived into this and found the issue comes from docker compose (or layer management for image built by docker compose). Here are the steps to illustrate the issue:

  1. setup:
    • Dockfile:
      FROM ubuntu:22.04
      WORKDIR /test
      COPY test.txt /test    # test.txt is an empty file in current directory created for this test
    • docker-compose.yaml
      version: "2"
      services:
      base_service:
      build:
      context: .
      dockerfile: Dockerfile
      producer:
      build:
      context: .
      dockerfile: Dockerfile
  2. steps to reproduce the issue
    • empty /var/lib/docker/ to start fresh: sudo rm -rf /var/lib/docker/* && sudo systemctl restart docker
    • build the image: docker compose build
    • remove the image: docker system prune -af --volumes
    • check what's remaining inside /var/lib/docker/overlay2:
      du -sh /var/lib/docker/overlay2/*
      84M 965beae5f7717138f1692aebfdf217410c8979a3d9a230fa98af73377c87b59a
      28K aqwgmdcpaqx8iegitwt91osnb
          /diff/test
             /test.txt
      8.0K    l
      28K onomfnueji94ninjm03iibrzz
          /diff/test
    • As we can see, docker system prune didn't clean up these layers:
    • 965beae5f7717138f1692aebfdf217410c8979a3d9a230fa98af73377c87b59a is the layer contain system lib. I don't fully understand why it's created but tests showed this layer is being reused across multiple builds. So this is not a concern.
    • aqwgmdcpaqx8iegitwt91osnb is the layer containing diff created by the COPY command in Dockerfile. onomfnueji94ninjm03iibrzz is also created by that but it's just an empty folder so it's less concerning.
    • now, let's repeat the build and cleanup steps again to simulate multiple builds in a normal dev workflow: docker compose build && docker system prune -af --volumes
    • as we can see, the new build created 2 layers again due to the COPY command
      84M 965beae5f7717138f1692aebfdf217410c8979a3d9a230fa98af73377c87b59a
      28K aqwgmdcpaqx8iegitwt91osnb
      8.0K    l
      28K onomfnueji94ninjm03iibrzz
      28K vhv6224u6je733pbvx7ut7zz9        <-- this is new
           /diff/test
             /test.txt
      28K ycn82ph962p8pawbxv8qkt2vw        <-- this is new
          /diff/test
    • Basically, when building the same image multiple times, it just keeps creating extra layers for any filesystem diff change. In my use case, I need to copy the source code and compile code as part of the Dockerfile. The extra layer could be much larger (~3GB). As I iterate on the image and re-build across time, it eventually eats up all my disk space. Is this a known issue? Would love to hear how we can avoid this.

Steps To Reproduce

No response

Compose Version

`Docker Compose version v2.17.2`

Docker Environment

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.21.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 23.0.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: nvidia runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc version: v1.1.5-0-gf19387a
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.14.0-1054-oem
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 31.09GiB
 Name: coram-whale-b71m
 ID: H6JC:JH4E:7ZHO:5HGF:APRE:CJDS:G6BS:MBQ6:PESU:TV5W:26K4:AQLJ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false


### Anything else?

_No response_
laurazard commented 1 year ago

Hiya @qianguih, thanks for the report.

I wouldn't think this is a Compose specific issue, but it's possible. Could you test this out with a plain build (docker build ..., no Compose) of your Dockerfile and see if this still happens? Adding to that, you could narrow down whether this happens with BuildKit or the classic builder by testing again with DOCKER_BUILDKIT=0 docker build ....

With that information we'll be able to direct you to a more appropriate repo to report your issue in.

qianguih commented 1 year ago

hi @laurazard , docker build is fine. Here is what I did:

DOCKER_BUILDKIT doesn't work for my current docker setup due to the following:

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
            environment-variable.
laurazard commented 1 year ago

@qianguih

DOCKER_BUILDKIT doesn't work for my current docker setup due to the following:

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
            environment-variable.

This is just a warning, it should still build. Could you please try that way and let us know if you can replicate the issue with the legacy builder?

jhrotko commented 1 month ago

@qianguih I believe this is no longer reproducable. Coud you please check on your side?

ndeloof commented 5 days ago

Closing as user didn't answered Please open a follow-up issue is this is still relevant, with detailed context