Open nx2804 opened 1 month ago
I think this may be a limitation of the "graphdriver" image store in the docker engine. The graphdriver store was designed to be optimnized for local disk consumption. As part of that, images pulled from a registry are extracted after they are pulled, after which the compressed layers are discarded, and only the extracted layers, as well as information about the pulled layers are preserved.
When saving pushing an image to the same registry, these layers, as well as the related "image manifests" are reconstructed, but this part is not reproducible (due to both compression artifacts as well as timestamps included in image manifest metadata).
That said; I tried to see what differences are between the saved files, and ... honestly, couldn't immediately find any; possible reasons could be the order in which files are included in the tar header, but they seem to be identical in every other way;
docker pull alpine
Using default tag: latest
latest: Pulling from library/alpine
Digest: sha256:beefdbd8a1da6d2915566fde36db9db0b524eb737fc57cd1367effd16dc0d06d
Status: Downloaded newer image for alpine:latest
docker.io/library/alpine:latest
docker image save -o one.tar
docker image save -o two.tar
shasum one.tar
05ee3ff4ae600438a025ab12339395bdc94dfa85 one.tar
shasum two.tar
1b74f13ee5f67bc8345d0d4cd1e70119c3990feb two.tar
tar --xattrs -tvf one.tar
drwxr-xr-x 0/0 0 2024-09-06 22:20 blobs/
drwxr-xr-x 0/0 0 2024-10-08 11:48 blobs/sha256/
-rw-r--r-- 0/0 401 1970-01-01 00:00 blobs/sha256/309ff318b44b4f2af442a37a269a93ce6907d277d2c168d3160f36cc802f8838
-rw-r--r-- 0/0 8081920 2024-09-06 22:20 blobs/sha256/63ca1fbb43ae5034640e5e6cb3e083e05c290072c5366fcaa9d62435a4cced85
-rw-r--r-- 0/0 1143 2024-09-06 22:20 blobs/sha256/6ad8fd5c38430e1ab05f033c689994934a216c1a7481aeb44de1239d7ca82f77
-rw-r--r-- 0/0 1471 2024-09-06 22:20 blobs/sha256/91ef0af61f39ece4d6710e465df5ed6ca12112358344fd51ae6a3b886634148b
-rw-r--r-- 0/0 362 2024-10-08 11:48 index.json
-rw-r--r-- 0/0 457 1970-01-01 00:00 manifest.json
-rw-r--r-- 0/0 31 1970-01-01 00:00 oci-layout
-rw-r--r-- 0/0 89 1970-01-01 00:00 repositories
tar --xattrs -tvf two.tar
drwxr-xr-x 0/0 0 2024-09-06 22:20 blobs/
drwxr-xr-x 0/0 0 2024-10-08 11:48 blobs/sha256/
-rw-r--r-- 0/0 401 1970-01-01 00:00 blobs/sha256/309ff318b44b4f2af442a37a269a93ce6907d277d2c168d3160f36cc802f8838
-rw-r--r-- 0/0 8081920 2024-09-06 22:20 blobs/sha256/63ca1fbb43ae5034640e5e6cb3e083e05c290072c5366fcaa9d62435a4cced85
-rw-r--r-- 0/0 1143 2024-09-06 22:20 blobs/sha256/6ad8fd5c38430e1ab05f033c689994934a216c1a7481aeb44de1239d7ca82f77
-rw-r--r-- 0/0 1471 2024-09-06 22:20 blobs/sha256/91ef0af61f39ece4d6710e465df5ed6ca12112358344fd51ae6a3b886634148b
-rw-r--r-- 0/0 362 2024-10-08 11:48 index.json
-rw-r--r-- 0/0 457 1970-01-01 00:00 manifest.json
-rw-r--r-- 0/0 31 1970-01-01 00:00 oci-layout
-rw-r--r-- 0/0 89 1970-01-01 00:00 repositories
I think switching to the containerd image store may help here; when using the containerd image store ("snapshotters"), pulled images, including their compressed layers, are keept, and the exported tar looks to be fully reproducible;
docker pull alpine
Using default tag: latest
latest: Pulling from library/alpine
Digest: sha256:beefdbd8a1da6d2915566fde36db9db0b524eb737fc57cd1367effd16dc0d06d
Status: Downloaded newer image for alpine:latest
docker.io/library/alpine:latest
docker save -o c8d-one.tar alpine:latest
docker save -o c8d-two.tar alpine:latest
shasum c8d-one.tar
b4d8c4f578be934ad2c0a82f7efd184cf027d27f c8d-one.tar
shasum c8d-two.tar
b4d8c4f578be934ad2c0a82f7efd184cf027d27f c8d-two.tar
If you have an environment to test on, it's worth switching to the containerd image store (which also provides support for storing multi-arch images);
Be aware though that switching the store switches to a different location for storing images and containers; your existing images won't be deleted, but won't be accessible (but still consume space). If possible, my recommendation is to remove content (containers, images) before switching.
Thanks for your response what is the default storage driver used in docker
can i switch containerd configuration to use the same storage driver used by docker
Docker (without the containers image store) selects the default storage driver based on the underlying filesystem. In most cases that is overlay2
.
When using the containerd image store, no detection is done currently, but the default will be the overlayfs
snapshotter (storage driver), which is the equivalent to overlay2
(both use the kernel's "OverlayFS")
I've reproduced the issue:
❯ docker save -o one.tar.gz debian:latest
❯ docker save -o two.tar.gz debian:latest
❯ wc -c one.tar.gz two.tar.gz
143606272 one.tar.gz
143606272 two.tar.gz
287212544 total
❯ shasum one.tar.gz two.tar.gz
d068d04161345aa5693859dbfc6015913fdd8af7 one.tar.gz
930e62e8f0cd9a24af709107f0b199ff87e570be two.tar.gz
On first pass, the metadata looks the same:
❯ shasum <(tar tvf one.tar.gz) <(tar tvf two.tar.gz)
cf795d491009e668091a1a13d83b949d00a80073 /dev/fd/14
cf795d491009e668091a1a13d83b949d00a80073 /dev/fd/15
However, if we look at the binary records, there is a clear difference:
❯ diff -ru <(hexdump -C one.tar.gz) <(hexdump -C two.tar.gz)
--- /dev/fd/14 2024-10-08 12:55:38
+++ /dev/fd/15 2024-10-08 12:55:56
@@ -20,7 +20,7 @@
00000260 00 00 00 00 30 30 30 30 37 35 35 00 30 30 30 30 |....0000755.0000|
00000270 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 |000.0000000.0000|
00000280 30 30 30 30 30 30 30 00 31 34 37 30 31 33 30 36 |0000000.14701306|
-00000290 30 31 36 00 30 31 31 33 35 33 00 20 35 00 00 00 |016.011353. 5...|
+00000290 30 32 34 00 30 31 31 33 35 32 00 20 35 00 00 00 |024.011352. 5...|
000002a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000300 00 75 73 74 61 72 00 30 30 00 00 00 00 00 00 00 |.ustar.00.......|
@@ -6857398,7 +6857398,7 @@
088f2e60 00 00 00 00 30 30 30 30 36 34 34 00 30 30 30 30 |....0000644.0000|
088f2e70 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 |000.0000000.0000|
088f2e80 30 30 30 30 35 35 32 00 31 34 37 30 31 33 30 36 |0000552.14701306|
-088f2e90 30 31 36 00 30 31 31 32 34 36 00 20 30 00 00 00 |016.011246. 0...|
+088f2e90 30 32 34 00 30 31 31 32 34 35 00 20 30 00 00 00 |024.011245. 0...|
088f2ea0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
088f2f00 00 75 73 74 61 72 00 30 30 00 00 00 00 00 00 00 |.ustar.00.......|
The difference seem to map two the timestamp field and a checksum that follows. In both cases we change from 14701306016
to 14701306024
. These look weird as timestamps but they are way in the future. However, because tar is a fun format, these are stored in octal ASCII. Decoding them, they map to an Oct 8th date. Let's have a look at the tar listing:
❯ tar tvf one.tar.gz
drwxr-xr-x 0 0 0 0 Jul 1 17:39 blobs/
drwxr-xr-x 0 0 0 0 Oct 8 12:46 blobs/sha256/
-rw-r--r-- 0 0 0 403 Dec 31 1969 blobs/sha256/c89edf5050f4db4a7ac20a64bdb77f7ddca76dfc2c87a39fddca419084dca080
-rw-r--r-- 0 0 0 143594496 Jul 1 17:39 blobs/sha256/d1660adccd2b42ad0160cba9a291ef75a87223577240a585a7f1cb90676ec3b8
-rw-r--r-- 0 0 0 1152 Jul 1 17:39 blobs/sha256/d5156a0989b7b62fd13b9f28e7e1864554ae6b47657a2efc503b097818653cad
-rw-r--r-- 0 0 0 1477 Jul 1 17:39 blobs/sha256/f753e4d18c7075845e84d759f49d57529f268aa7a262b517fd9f3d62749890eb
-rw-r--r-- 0 0 0 362 Oct 8 12:46 index.json
-rw-r--r-- 0 0 0 459 Dec 31 1969 manifest.json
-rw-r--r-- 0 0 0 31 Dec 31 1969 oci-layout
-rw-r--r-- 0 0 0 89 Dec 31 1969 repositories
From here, we can see that index.json
and blobs/sha256
are generated with the current time as the timestamp on these tar header records. There can be a few causes of that but we should be able to track it down.
Here's some of my info:
Server: Docker Desktop 4.33.0 (159291)
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:44 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Note that I do not have the containerd snapshotter enabled (I should though ;) ), so this is just from the overlay2 graphdriver. I don't remember if this is in graphdriver or not but it likely is.
As a matter of course, this really isn't a cli
bug but we can likely fix it in moby
. We should declare whether or not the docker save
command is hash stable.
Ok, breaking this down to make the fix easier. We have two bugs:
index.json
in https://github.com/moby/moby/blob/master/image/tarexport/save.go#L385. This needs to have a system.Chtimes
call that follows it. Fairly straightforward fix.os.MkdirAll
: https://github.com/moby/moby/blob/master/image/tarexport/save.go#L263. That will naively create intermediary path components with the creation time of the local machine. We need to walk back up to the index root and set those timestamps correctly.let me once this feature is merged and available
Description
scenario docker save -o tarfilename
during everytime when we try to save the same image docker is modifying the shasum values , instead the sha values should be identical
Reproduce
docker save -o tarfilename imagename:tagname again try to save the same image with tarfilename1 execute shasum tarfilename shasum tarfilename1
the sha values will be different
Expected behavior
No response
docker version
docker version 1.24.6
docker info
docker version 1.24.6
Additional Info
No response