dushaoshuai / dushaoshuai.github.io

https://www.shuai.host
0 stars 0 forks source link

对 Docker 容器镜像的一点理解 #135

Open dushaoshuai opened 1 year ago

dushaoshuai commented 1 year ago

软件版本:

$ uname -srmo
Linux 6.1.51-1-MANJARO x86_64 GNU/Linux
$ pacman -Qo mount
/usr/bin/mount is owned by util-linux 2.39.2-1
$ docker --version
Docker version 24.0.5, build ced0996600

容器文件系统实现

我们知道,容器进程的文件系统是一个独立的隔离环境,而不是继承自宿主机的文件系统。

Docker 在创建容器进程时,会为其启用 Mount Namespace,并为其根目录挂载一个完整操作系统的文件系统,为容器进程提供隔离后的执行环境。这个挂载的文件系统,就是所谓的容器镜像。一般地,镜像打包了应用、应用依赖、操作系统的文件系统。

Docker 在镜像的设计中引入了层的概念,每一层都是一个独立的文件系统,对层的修改会以增量的方式生成一个新的层。那么,这些层如何被以一个文件系统的形式提供给容器进程呢?答案是联合文件系统(union filesystem),它把不同的层联合挂载(union mount)到同一个目录下,表现为一个文件系统。

关于联合文件系统的一个实现 - OverlayFS,可以参考 overlay2

观察镜像的层 - 以 overlay2 为存储驱动

使用 docker info 命令查看存储驱动(如果不是 overlay2,可以配置 Docker 使用 overlay2),此处省略了一些信息:

$ docker info
Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: true
  Native Overlay Diff: false
  userxattr: false

拉取 nginx 的最新镜像,观察到该镜像包括 7 层(每个 Pull complete 表示拉取了一层)。

$ docker pull nginx
Using default tag: latest
latest: Pulling from library/nginx
360eba32fa65: Pull complete 
c5903f3678a7: Pull complete 
27e923fb52d3: Pull complete 
72de7d1ce3a4: Pull complete 
94f34d60e454: Pull complete 
e42dcfe1730b: Pull complete 
907d1bb4e931: Pull complete 
Digest: sha256:6926dd802f40e5e7257fded83e0d8030039642e4e10c4a98a6478e9c6fe06153
Status: Downloaded newer image for nginx:latest
docker.io/library/nginx:latest

这些层存储在 /var/lib/docker/overlay2 目录下,每层一个目录(注意上面 Pull complete 前面的层 ID 并不对应目录名)。

使用 docker image inspect 命令打印镜像的详细信息,此处省略了一些信息:

$ docker image inspect nginx
[
    {
        "DockerVersion": "20.10.23",
        "Architecture": "amd64",
        "Os": "linux",
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/b9205bc33291fecf44a30e4531ca0a7157d6068d352fd8e3391d046cb6645bfe/diff:/var/lib/docker/overlay2/e7c79e02c83bb32ec060bd1147973faa2a810755df21ab7f62e31e832d0750bf/diff:/var/lib/docker/overlay2/044b7ab0547c2fb79e0c24275369d55387161f06e4466fe5cd350f0ecd5a629f/diff:/var/lib/docker/overlay2/e7ea0d7accdf1bc411d19d71eb8efcd779259b6827f6340167038cbeca364d3e/diff:/var/lib/docker/overlay2/f4809c376485831a510e711f7938165ccfbae7409b865057be5948e151e9c009/diff:/var/lib/docker/overlay2/17b1a173f69b634d6fdc1b0b56e18bc860dff585aef08765e3b25c80ed6f8f72/diff",
                "MergedDir": "/var/lib/docker/overlay2/19edbcc8e275db9caa98a0f60706e173bb683bc132493db65d88c601d7e853fa/merged",
                "UpperDir": "/var/lib/docker/overlay2/19edbcc8e275db9caa98a0f60706e173bb683bc132493db65d88c601d7e853fa/diff",
                "WorkDir": "/var/lib/docker/overlay2/19edbcc8e275db9caa98a0f60706e173bb683bc132493db65d88c601d7e853fa/work"
            },
            "Name": "overlay2"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:a2d7501dfb3541f3d736125cdfd748618194f60cbb5c63f4de67a92530899628",
                "sha256:c74e4ebd28441929705c16a498f1b08a6adfb01beaa327e1c60dbffb2b87587b",
                "sha256:1b34d645672f25f7f3ea55ce5d49567482e3ad8b10739284f79fc2ecdc9b2f9a",
                "sha256:bf4045499bea273f25a8f7f936de5ae264cb605434a1686805199331ddef04c2",
                "sha256:3bfd54ea739a052cb3a7ce8532e99606ecdc03b30caccf8a2853978d19da06c8",
                "sha256:e48f2ce44b27d81fec6f48d4ea0e87824f4c4b95385328e240a526a4e9b17917",
                "sha256:aae231785348cfc880f1100c7cf02a28dd229f614e00ecca40fed5c329e23cb6"
            ]
        }
    }
]

观察到镜像的 RootFS(根文件系统) 包含 7 层。GraphDriver 是一个抽象层,它负责管理 Docker 镜像的存储,观察到使用的 GraphDriver 实现是 overlay2。

image

如上图所示,通过 overlay2 机制,UpperDir 的 1 层和 LowerDir 的 6 层层层叠加,联合挂载到 MergedDir,最终表现为镜像的文件系统。

观察 LowerDir 的最低一层,发现这一层是一个完整的操作系统文件系统:

$ sudo ls /var/lib/docker/overlay2/17b1a173f69b634d6fdc1b0b56e18bc860dff585aef08765e3b25c80ed6f8f72/diff
bin   dev  home  lib32  libx32  mnt  proc  run   srv  tmp  var
boot  etc  lib   lib64  media   opt  root  sbin  sys  usr

观察 LowerDir 的倒数第二层,这一层在最低一层的基础上做了一些修改:

$ sudo ls /var/lib/docker/overlay2/f4809c376485831a510e711f7938165ccfbae7409b865057be5948e151e9c009/diff
docker-entrypoint.d  etc  tmp  usr  var

以此类推。

观察容器的层 - 以 overlay2 为存储驱动

启动容器进程:docker run nginx:latest

使用 docker container inspect 命令打印容器详细信息,这里省略了一些信息,可以观察到容器文件系统的组成:

$ docker container inspect goofy_wright
[
    {
        "Driver": "overlay2",
        "Platform": "linux",
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3-init/diff:/var/lib/docker/overlay2/19edbcc8e275db9caa98a0f60706e173bb683bc132493db65d88c601d7e853fa/diff:/var/lib/docker/overlay2/b9205bc33291fecf44a30e4531ca0a7157d6068d352fd8e3391d046cb6645bfe/diff:/var/lib/docker/overlay2/e7c79e02c83bb32ec060bd1147973faa2a810755df21ab7f62e31e832d0750bf/diff:/var/lib/docker/overlay2/044b7ab0547c2fb79e0c24275369d55387161f06e4466fe5cd350f0ecd5a629f/diff:/var/lib/docker/overlay2/e7ea0d7accdf1bc411d19d71eb8efcd779259b6827f6340167038cbeca364d3e/diff:/var/lib/docker/overlay2/f4809c376485831a510e711f7938165ccfbae7409b865057be5948e151e9c009/diff:/var/lib/docker/overlay2/17b1a173f69b634d6fdc1b0b56e18bc860dff585aef08765e3b25c80ed6f8f72/diff",
                "MergedDir": "/var/lib/docker/overlay2/7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/merged",
                "UpperDir": "/var/lib/docker/overlay2/7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/diff",
                "WorkDir": "/var/lib/docker/overlay2/7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/work"
            },
            "Name": "overlay2"
        }
    }
]

image

如上图,容器的根文件系统由三层组成,由下向上依次是:

  1. 镜像层(image layers):nginx 镜像的层,只读。
  2. Init 层:我们希望做出一些只对当前容器有效的修改(比如设置 hostname),不希望执行 docker commit 时提交这些修改,因此将这些修改以一个单独的层挂载,执行 docker commit 时只会提交容器层,而不包含 Init 层的内容。

    • 查看 Init 层的内容:

      $ tree 7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3-init/diff/
      7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3-init/diff/
      ├── dev
      │   ├── console
      │   ├── pts
      │   └── shm
      └── etc
          ├── hostname
          ├── hosts
          ├── mtab -> /proc/mounts
          └── resolv.conf
      
      5 directories, 5 files
  3. 容器层(container layer):nginx 容器的可读写层,在容器中的写操作都会发生在这一层。

这 3 层被联合挂载到 MergedDir,最终表现为一个完整的文件系统供容器使用。

在容器中进行一些写操作:

$ docker exec goofy_wright mkdir -p /home/shaouai/
$ docker exec goofy_wright bash -c "echo shaouai > /home/shaouai/me.md"
$ docker exec goofy_wright cat /home/shaouai/me.md
shaouai

在 MergedDir 中,出现了新建的目录和文件:

tree 7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/merged/home/
7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/merged/home/
└── shaouai
    └── me.md

2 directories, 1 file

其实,修改是发生在容器的可读写层的:

tree 7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/diff/home/
7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/diff/home/
└── shaouai
    └── me.md

2 directories, 1 file

查看挂载情况,MergedDir 确实是一个挂载点,rw 表明这是一个可读写的 overlay 挂载:

$ mount | grep overlay
overlay on /var/lib/docker/overlay2/7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/A6QIFTTF6JQTCT2L3MWQQNDBCK:/var/lib/docker/overlay2/l/LH7FUBMUZZVHN5HUZFDV23WMCL:/var/lib/docker/overlay2/l/7D6E6RA26IDIG47XCBPKG7HACU:/var/lib/docker/overlay2/l/FD3DAQAT5BXLFMP7VSHGHUPZK7:/var/lib/docker/overlay2/l/WA34QSLFVQV5OBKKT3FUJVWLU4:/var/lib/docker/overlay2/l/ST2N4672GXMMIM4LFAZMBZH4J4:/var/lib/docker/overlay2/l/CBURYREZFE6ZRV4L7MUROH6VG5:/var/lib/docker/overlay2/l/PNAYDCABIMTEG74LTJSSTE3IZ4,upperdir=/var/lib/docker/overlay2/7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/diff,workdir=/var/lib/docker/overlay2/7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/work,index=off)

不过仔细观察会发现,LowerDir 中的层和之前看到的不太一样,查看 LowerDir 中最高层的详细信息:

$ file /var/lib/docker/overlay2/l/A6QIFTTF6JQTCT2L3MWQQNDBCK
/var/lib/docker/overlay2/l/A6QIFTTF6JQTCT2L3MWQQNDBCK: symbolic link to ../7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3-init/diff

这是一个指向层的软链接。其实,/var/lib/docker/overlay2/l 目录下存放着许多短的层标识符,它们都是软链接,指向真正的层:

ls -l /var/lib/docker/overlay2/l
total 36
lrwxrwxrwx 1 root root 72 Sep 21 08:38 4CLBKLJWHLVX624WASNL4N3FNS -> ../7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3/diff
lrwxrwxrwx 1 root root 72 Sep 17 16:32 7D6E6RA26IDIG47XCBPKG7HACU -> ../b9205bc33291fecf44a30e4531ca0a7157d6068d352fd8e3391d046cb6645bfe/diff
lrwxrwxrwx 1 root root 77 Sep 21 08:38 A6QIFTTF6JQTCT2L3MWQQNDBCK -> ../7ce68438d02a22573086d039dbc2401b4a71a9a4409952fa7e2e7a0c13bdfff3-init/diff
lrwxrwxrwx 1 root root 72 Sep 17 16:32 CBURYREZFE6ZRV4L7MUROH6VG5 -> ../f4809c376485831a510e711f7938165ccfbae7409b865057be5948e151e9c009/diff
lrwxrwxrwx 1 root root 72 Sep 17 16:32 FD3DAQAT5BXLFMP7VSHGHUPZK7 -> ../e7c79e02c83bb32ec060bd1147973faa2a810755df21ab7f62e31e832d0750bf/diff
lrwxrwxrwx 1 root root 72 Sep 17 16:32 LH7FUBMUZZVHN5HUZFDV23WMCL -> ../19edbcc8e275db9caa98a0f60706e173bb683bc132493db65d88c601d7e853fa/diff
lrwxrwxrwx 1 root root 72 Sep 17 16:32 PNAYDCABIMTEG74LTJSSTE3IZ4 -> ../17b1a173f69b634d6fdc1b0b56e18bc860dff585aef08765e3b25c80ed6f8f72/diff
lrwxrwxrwx 1 root root 72 Sep 17 16:32 ST2N4672GXMMIM4LFAZMBZH4J4 -> ../e7ea0d7accdf1bc411d19d71eb8efcd779259b6827f6340167038cbeca364d3e/diff
lrwxrwxrwx 1 root root 72 Sep 17 16:32 WA34QSLFVQV5OBKKT3FUJVWLU4 -> ../044b7ab0547c2fb79e0c24275369d55387161f06e4466fe5cd350f0ecd5a629f/diff

这样做是为了避免达到 mount 命令参数的大小限制。

镜像分层的好处

(由 Bard 生成)

参见