goldmann / docker-squash

Docker image squashing tool
MIT License
848 stars 109 forks source link

docker-squash looses history information #155

Closed Vadiml1024 closed 6 years ago

Vadiml1024 commented 7 years ago

I'm doing docker-squash relative to some base layer, then doing couple commits on the squashed image and then trying to do the squash again relative to the same base layer.. docker-squash does not find the base layer in the latest image. It seems that the reason is loss history info in squashed image. This following bash scrip demoes the effect by showing the the Parent field in the squashed image manifest is empty. `#!/bin/bash

cat >Dockerfile <<EOF FROM alpine:3.5

RUN mkdir data && echo Begin >/data/file.txt

EOF

docker build -t squashtest:base .

image=squashtest:base for i in 1 2 ; do

docker run --name=squashtest.$i.1 $image  sh -c "echo Line $i.1  >>data/file.txt"
docker commit squashtest.$i.1  squashtest:$i.1
docker run --name=squashtest.$i.2 squashtest:$i.1 sh -c "echo Line $i.2  >>data/file.txt"
docker commit squashtest.$i.2 squashtest:$i.2
image=squashtest:$i.2

done

docker-squash -t squashtest:squash-1.2 -f squashtest:base squashtest:1.2 docker inspect squashtest:squash-1.2 | grep -i Parent`

goldmann commented 7 years ago

What was the squash command you used to squash the image?

Vadiml1024 commented 7 years ago

docker-squash -t squashtest:squash-1.2 -f squashtest:base squashtest:1.2

Vadiml1024 commented 7 years ago

You you do: docker history squashtest:squash-1.2

You'll see history image id's are missing

Vadiml1024 commented 7 years ago

BTW i'm runnig 'make test' and seeing that test_should_not_fail_with_hard_links_to_files_gh_99 fails

DvdGiessen commented 6 years ago

I'm having the same problem. A complete minimal testcase:

PS C:\workspace\xxx-docker-squash-issue-poc> cat Dockerfile
FROM alpine:latest
RUN echo "Hello" > foo
RUN echo "World" > bar

PS C:\workspace\xxx-docker-squash-issue-poc> docker build -t example .
Sending build context to Docker daemon  282.6kB
Step 1/3 : FROM alpine:latest
 ---> ee4603260daa
Step 2/3 : RUN echo "Hello" > foo
 ---> Running in 7f816ba3fe42
Removing intermediate container 7f816ba3fe42
 ---> cfcca150b660
Step 3/3 : RUN echo "World" > bar
 ---> Running in ab753ce14f4b
Removing intermediate container ab753ce14f4b
 ---> e9cfe15aa2b2
Successfully built e9cfe15aa2b2
Successfully tagged example:latest
SECURITY WARNING: You are building a Docker image from Windows against a non-Windows Docker host. All files and directories added to build context will have '-rwxr-xr-x' permissions. It is recommended to double check and reset permissions for sensitive files and directories.

PS C:\workspace\xxx-docker-squash-issue-poc> docker history example
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
e9cfe15aa2b2        7 seconds ago       /bin/sh -c echo "World" > bar                   6B
cfcca150b660        9 seconds ago       /bin/sh -c echo "Hello" > foo                   6B
ee4603260daa        14 months ago       /bin/sh -c #(nop) ADD file:d6ee3ba7a4d59b161…   4.8MB

PS C:\workspace\xxx-docker-squash-issue-poc> docker-squash -f alpine:latest -t example:squashed example
2017-12-04 15:49:30,772 root         INFO     docker-squash version 1.0.6, Docker 1caf76c, API 1.34...
2017-12-04 15:49:30,774 root         INFO     Using v2 image format
2017-12-04 15:49:30,790 root         INFO     Old image has 3 layers
2017-12-04 15:49:30,796 root         INFO     Checking if squashing is necessary...
2017-12-04 15:49:30,796 root         INFO     Attempting to squash last 2 layers...
2017-12-04 15:49:30,796 root         INFO     Saving image sha256:e9cfe15aa2b215d0c5aa68ee3e330c6975e702e3528c141fa0453754f2813609 to C:\Users\DVANDE~1\AppData\Local\Temp\docker-squash-dx8i1vkv\old directory...
2017-12-04 15:49:30,971 root         INFO     Image saved!
2017-12-04 15:49:30,971 root         INFO     Squashing image 'example'...
2017-12-04 15:49:30,972 root         INFO     Starting squashing...
2017-12-04 15:49:30,994 root         INFO     Squashing file 'C:\Users\DVANDE~1\AppData\Local\Temp\docker-squash-dx8i1vkv\old\37287265a0d0f7f7d711094ff5cd526afc57e482be9de7b6aee3112abeb68364\layer.tar'...
2017-12-04 15:49:30,996 root         INFO     Squashing file 'C:\Users\DVANDE~1\AppData\Local\Temp\docker-squash-dx8i1vkv\old\82ae1dc2fc5d06377c7f82fd7a2e35a04df4fa76c695703e9820aebf99b588a1\layer.tar'...
2017-12-04 15:49:31,002 root         INFO     Squashing finished!
2017-12-04 15:49:31,058 root         INFO     New squashed image ID is 1f75f708ed0295c87935c01201b9c5006170fcad3672e5fd9d74bc109655e13b
2017-12-04 15:49:31,365 root         INFO     Image registered in Docker daemon as example:squashed
2017-12-04 15:49:31,371 root         INFO     Done

PS C:\workspace\xxx-docker-squash-issue-poc> docker history example:squashed
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
94739305dd9e        10 seconds ago                                                      12B
<missing>           14 months ago       /bin/sh -c #(nop) ADD file:d6ee3ba7a4d59b161…   4.8MB

Run from Windows, but the exact same reproduction works on Linux with the same issue.

goldmann commented 6 years ago

This is expected.

When you execute squashing history is altered (it's rewritten in fact). You cannot expect that the docker history will be reflecting state before squashing. Since we squash layers, we cannot even populate the history with meaningful comments, that's why you see empty field "created by". Additionally - number of entries in the history command is equal to number of layers after squashing. Hope this makes it clear now.

Since this is not a bug, I'm closing this issue. Feel free to add additional comments if you have any.

DvdGiessen commented 6 years ago

@goldmann True, the entire point of docker-squash is rewriting history. :) However, I guess the issue is that I was expecting docker-squash to only alter history for the layers it is squashing.

When using the -f flag (as in the example above), docker-squash only squashes a subset of all the layers forming my image, yet in the history the layers which were not squashed are still being "altered". In my minimal test case, I'm using alpine:latest as my base, and add two additional layers (two RUN commands). The image history contains three layers: The original alpine image (which happens to have a single layer) and the two layers I've added which a single file each.

Then, I ask docker-squash to only squash those last two layers together. So, the new image only has two layers: The original alpine layer, which I expect to be completely unmodified, and a single layer which contains the two files I've created. However, that original alpine layer is modified: In docker history we suddenly lost the image ID. While it is still the exact same alpine layer, which is even reflected by its created date.

Of course, when using images which consist more layers than just the single layer in this minimal example, the change becomes more apparent.

So it's not that I find it unexpected that the docker history is modified, but just that it loses the image id's for layers I expected to remain unaltered. :)

EDIT: If this is still as expected, I'm totally OK with that, but perhaps then it might be useful to more clearly document how the -f flag interacts with layers which are not explicitly squashed? That might help others who encounter this behaviour.

goldmann commented 6 years ago

No, it's not modified, trust me :) This is just how the history is presented. I haven't looked into details recently, but missing ID's is (kind of) expected. To prove it, you can use docker save commands on both of these images, untar these and compare the tar archives for the main, alpine layer. Moreover you can run the squash command with verbose flag to see that it actually copies the data over.

Side note - output of docker history command is just what is written in metadata, as plain text. It is NOT computed by any means based on the image layers. I could make it return 1000 layers with some random comments :)

DvdGiessen commented 6 years ago

@goldmann Sorry, I wasn't clear. The image content is squashed correctly, all the data in there is correct, both for the squashed layers and for the non-squashed ones. For most use cases where I've used docker-squash, it works perfect! However, it is the lack correct metadata which is what I'm considering in this specific issue.

To demonstrate, again a minimal testcase, the resulting issue I suspect @Vadiml1024 was having as well:

+ echo We first build a base image:
We first build a base image:
+ cat base.Dockerfile
FROM alpine:latest
RUN echo "Hello" > foo
RUN echo "World" > bar
CMD ["cat", "foo", "bar"]
+ docker build -t example:base -f base.Dockerfile .
Sending build context to Docker daemon  4.608kB
Step 1/4 : FROM alpine:latest
 ---> e21c333399e0
Step 2/4 : RUN echo "Hello" > foo
 ---> Using cache
 ---> 97ec8dfb2dc1
Step 3/4 : RUN echo "World" > bar
 ---> Using cache
 ---> d7544bbc1cd8
Step 4/4 : CMD ["cat", "foo", "bar"]
 ---> Using cache
 ---> 08a415080778
Successfully built 08a415080778
Successfully tagged example:base
+ echo We squash its layers together:
We squash its layers together:
+ docker-squash -f alpine:latest -t example:squashed example:base
2017-12-04 17:01:37,908 root         INFO     docker-squash version 1.0.6, Docker 1caf76c, API 1.34...
2017-12-04 17:01:37,909 root         INFO     Using v2 image format
2017-12-04 17:01:37,927 root         INFO     Old image has 5 layers
2017-12-04 17:01:37,938 root         INFO     Checking if squashing is necessary...
2017-12-04 17:01:37,939 root         INFO     Attempting to squash last 3 layers...
2017-12-04 17:01:37,939 root         INFO     Saving image sha256:08a41508077887052c5be08791af651fcc2726444e6ee4ab90643f7f05784523 to /tmp/docker-squash-kK9aG_/old directory...
2017-12-04 17:01:38,063 root         INFO     Image saved!
2017-12-04 17:01:38,063 root         INFO     Squashing image 'example:base'...
2017-12-04 17:01:38,065 root         INFO     Starting squashing...
2017-12-04 17:01:38,085 root         INFO     Squashing file '/tmp/docker-squash-kK9aG_/old/7cc75d56b525913728a64b2e41607b82ac739edee34faa909c5e3ba93182ab50/layer.tar'...
2017-12-04 17:01:38,087 root         INFO     Squashing file '/tmp/docker-squash-kK9aG_/old/941ee79d4a8355aaab39430f8d5f97b58a54cb791f2c398ef73bf5110ca87b43/layer.tar'...
2017-12-04 17:01:38,093 root         INFO     Squashing finished!
2017-12-04 17:01:38,162 root         INFO     New squashed image ID is e1987b0b82ee7816363ed945ead5683846db7caeb90b034c21edd3e91ddac151
2017-12-04 17:01:38,359 root         INFO     Image registered in Docker daemon as example:squashed
2017-12-04 17:01:38,370 root         INFO     Done
+ docker history example:squashed
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
e1987b0b82ee        1 second ago                                                        12B
<missing>           2 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>           2 days ago          /bin/sh -c #(nop) ADD file:2b00f26f6004576...   4.14MB
+ echo We build a child, based on the squashed base image:
We build a child, based on the squashed base image:
+ cat child.Dockerfile
FROM example:squashed
RUN echo "DvdGiessen" > bar
+ docker build -t example:child -f child.Dockerfile .
Sending build context to Docker daemon  4.608kB
Step 1/2 : FROM example:squashed
 ---> e1987b0b82ee
Step 2/2 : RUN echo "DvdGiessen" > bar
 ---> Running in 0286df3485b6
Removing intermediate container 0286df3485b6
 ---> b6b12a617fa9
Successfully built b6b12a617fa9
Successfully tagged example:child
+ echo Now, this works just fine, since using a number of layers doesn't need info from the metadata:
Now, this works just fine, since using a number of layers doesn't need info from the metadata:
+ docker-squash -f 2 -t example:squashed-child example:child
2017-12-04 17:01:40,394 root         INFO     docker-squash version 1.0.6, Docker 1caf76c, API 1.34...
2017-12-04 17:01:40,395 root         INFO     Using v2 image format
2017-12-04 17:01:40,410 root         INFO     Old image has 4 layers
2017-12-04 17:01:40,412 root         INFO     Checking if squashing is necessary...
2017-12-04 17:01:40,412 root         INFO     Attempting to squash last 2 layers...
2017-12-04 17:01:40,413 root         INFO     Saving image sha256:b6b12a617fa9845d784a348ca06261387b7d09e798f832786d8148d476d78d11 to /tmp/docker-squash-_Tu2S7/old directory...
2017-12-04 17:01:40,557 root         INFO     Image saved!
2017-12-04 17:01:40,557 root         INFO     Squashing image 'example:child'...
2017-12-04 17:01:40,559 root         INFO     Starting squashing...
2017-12-04 17:01:40,582 root         INFO     Squashing file '/tmp/docker-squash-_Tu2S7/old/2e3482b21333e2a47082e7fceabc42e2f5c9522c7586ad9f583ab1ee8be3fedf/layer.tar'...
2017-12-04 17:01:40,586 root         INFO     Squashing file '/tmp/docker-squash-_Tu2S7/old/80d84bcd460ab1a06671b5377d59d0cdfb27473f59398a1f74c5e0bb8d936c98/layer.tar'...
2017-12-04 17:01:40,593 root         INFO     Squashing finished!
2017-12-04 17:01:40,660 root         INFO     New squashed image ID is 0f896990ac7292387bda685303fefac2f9fc2753d92a27f70d4e5deac05669c5
2017-12-04 17:01:41,018 root         INFO     Image registered in Docker daemon as example:squashed-child
2017-12-04 17:01:41,029 root         INFO     Done
+ echo But, due to missing metadata, this doesn't:
But, due to missing metadata, this doesn't:
+ docker-squash -f alpine:latest -t example:squashed-child example:child
2017-12-04 17:01:41,327 root         INFO     docker-squash version 1.0.6, Docker 1caf76c, API 1.34...
2017-12-04 17:01:41,328 root         INFO     Using v2 image format
2017-12-04 17:01:41,361 root         INFO     Old image has 4 layers
2017-12-04 17:01:41,381 root         ERROR    Couldn't find the provided layer (alpine:latest) in the example:child image
2017-12-04 17:01:41,382 root         ERROR    Execution failed, consult logs above. If you think this is our fault, please file an issue: https://github.com/goldmann/docker-squash/issues, thanks!

(I've run this from a shell script with set +x to clarify which exact command are executed.)

The last action fails, specifically because the metadata is missing. Without the missing metadata, we cannot reference specific layers of the image anymore since their ID's are missing. Which isn't a problem if you're just using the final image, but is a problem if you after squashing still want to work with the metadata, which isn't available through the history command anymore.

(Note, in the README after squashing a number of layers the non-squashed layers still have their image ID's in the docker history output, so I'm guessing this actually used to work just fine. Sorry, haven't had time to test older versions.)

goldmann commented 6 years ago

Now we're talking again :)

What you just described is't a behavior of squash tool, but Docker itself and I'll prove it, but first - some history.

With the move to format v2 of the container image (Docker 1.10 or newer), Docker made thing way more complicated compared to v1 where layer ID's (what you are expecting in the docker history output) were just the ID's of the layer. Now this is not true anymore for a reason I do not understand fully. You may be interested in this: https://github.com/goldmann/docker-squash/issues/48 for some history about these findings and 1.10+ support.

BTW, specifying number of layers to squash is exactly about working around this issue - in case you have missing layers - you can specify number of layers to squash.

Now, to prove that this is not a squash issue, consider following (familiar) Dockerfile:

FROM alpine:latest
RUN echo "Hello" > foo
RUN echo "World" > bar
CMD ["cat", "foo", "bar"]

Let's build it:

➜  alpine-test docker build -t alpine-test:latest .
Sending build context to Docker daemon 2.048 kB
Step 1/4 : FROM alpine:latest
sha256:ccba511b1d6b5f1d83825a94f9d5b05528db456d9cf14a1ea1db892c939cda64: Pulling from docker.io/library/alpine
2fdfe1cd78c2: Pull complete 
Digest: sha256:ccba511b1d6b5f1d83825a94f9d5b05528db456d9cf14a1ea1db892c939cda64
Status: Downloaded newer image for docker.io/alpine:latest
 ---> e21c333399e0
Step 2/4 : RUN echo "Hello" > foo
 ---> Running in 592846d4f038
 ---> 4cf8f55a8259
Removing intermediate container 592846d4f038
Step 3/4 : RUN echo "World" > bar
 ---> Running in 9630bea95031
 ---> 02512e90e2a3
Removing intermediate container 9630bea95031
Step 4/4 : CMD cat foo bar
 ---> Running in d56711e6d43c
 ---> 6d392f7a6172
Removing intermediate container d56711e6d43c
Successfully built 6d392f7a6172

And check history:

➜  alpine-test docker history alpine-test:latest
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
6d392f7a6172        52 seconds ago      /bin/sh -c #(nop)  CMD ["cat" "foo" "bar"]      0 B                 
02512e90e2a3        53 seconds ago      /bin/sh -c echo "World" > bar                   6 B                 
4cf8f55a8259        54 seconds ago      /bin/sh -c echo "Hello" > foo                   6 B                 
e21c333399e0        3 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0 B                 
<missing>           3 days ago          /bin/sh -c #(nop) ADD file:2b00f26f6004576...   4.14 MB

Looks good. Now let's save this image into a file, remove the image from daemon and load it again.

➜  alpine-test docker save -o image.tar alpine-test:latest
➜  alpine-test docker rmi alpine-test:latest
Untagged: alpine-test:latest
Deleted: sha256:6d392f7a617230a87bffb28b60d782b28a940c1ba7a4e9ee8aadc7ca1c0cf889
Deleted: sha256:02512e90e2a36d54e4f1dff701837cf4ed78b825740c177d72b336e3f69ea523
Deleted: sha256:9526f1cf9aebf374afcf91371c0d293636118b5dbf8a2d80b96ed211fdf1e7bd
Deleted: sha256:4cf8f55a8259137aa9dd14a91119244917eb0db3fb1af289b519e7280dea56e3
Deleted: sha256:f5ac1b8bc318c279f8b515abc0dfb102383f1f03b75951527c3c250424e235e8
➜  alpine-test docker load -i image.tar         
69ef720f6074: Loading layer [==================================================>] 3.072 kB/3.072 kB
1bd752b8feaa: Loading layer [==================================================>] 2.048 kB/2.048 kB
Loaded image: alpine-test:latest

OK, image is loaded, let's check history of that image.

➜  alpine-test docker history alpine-test:latest
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
6d392f7a6172        4 minutes ago       /bin/sh -c #(nop)  CMD ["cat" "foo" "bar"]      0 B                 
<missing>           4 minutes ago       /bin/sh -c echo "World" > bar                   6 B                 
<missing>           4 minutes ago       /bin/sh -c echo "Hello" > foo                   6 B                 
<missing>           3 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0 B                 
<missing>           3 days ago          /bin/sh -c #(nop) ADD file:2b00f26f6004576...   4.14 MB

Uh, oh... Should be the same, right? We haven't done any changes to the image, nor any post processing, but the image loaded is missing some metadata.

To summarize: yes, we're missing some metadata, but this is not a problem of the squash tool, but Docker itself.

Hope this small example shows that we are not able to do anything with it. What I can suggest is to squash immediately after the image is built if yo do it on a different host, at different time - you are into troubles if you do not know how many layers to squash.

goldmann commented 6 years ago

One more comment - in README you see the output of squashing with Docker pre 1.10. This is why you see the IDs there.

DvdGiessen commented 6 years ago

@goldmann Aha! So that's what's causing the loss of metadata. I guess I was to quick to assume that since you mentioned the metadata is written by docker-squash (you could let it return anything, even 1000 layers with random comments) that the loss of data thus was also happening there.

Practically, I'm able to work around it just fine, just wanted to make sure that if there was something that might be fixed in docker-squash and benefit everyone I was reporting it correctly.

Thank you so much for taking the time to respond to this issue and explain the core issue, even when all along it wasn't a problem with docker-squash. :)

goldmann commented 6 years ago

since you mentioned the metadata is written by docker-squash (you could let it return anything, even 1000 layers with random comments) that the loss of data thus was also happening there.

Oh, yes, but with the exception of the layer ID :) I can modify command, date and size.

Thank you so much for taking the time to respond to this issue and explain the core issue, even when all along it wasn't a problem with docker-squash. :)

You welcome!