Open goloroden opened 7 years ago
@goloreden can you give us the names of the images you are having issues with and we will look to see what is causing the problem.
Yes, of course: thenativeweb/enterjs2017
, starting from version 0.2.43
.
It doesn't matter if I try to pull an actual version, or just latest
. Both can not be found.
@goloroden did you happen to change your password in hub recently?
Unfortunately, no :-(
@goloroden weird, one thing to try.
Did it work? If not, do a docker login and login with creds that will be able to pull the images. Try a pull again, did it work that time?
I'm trying to see if the issue is with docker engine, or swarm. Hopefully this will narrow it down a little.
I think you may have to docker login
and then update the service with --with-registry-auth
. See this issue: https://github.com/moby/moby/issues/24940
I have tried to docker login
again and to redeploy using the --with-registry-auth
flag. Same effect as before: Image could not be found. As said, if I then try to run
$ docker pull thenativeweb/enterjs2017:0.2.44
locally, everything works fine.
If I login to one of the managers, and run the same command, it fails:
$ docker pull thenativeweb/enterjs2017:0.2.44
Error response from daemon: repository thenativeweb/enterjs2017 not found: does not exist or no pull access
If I now run docker login
and login using the same credentials as on my local machine, and re-run it, it pulls the image as expected without any errors.
Ok, now that you can pull from that manager run the service update that @friism talked about above from the same manager, and see if that helps.
I have tried that, and it does not help. I can pull the image without problems, but I can not create a service using the image. The image can not be found.
@goloroden ok, thanks, I wonder if there is a bug in swarm, where it lost the credentials for some reason.
one last thing to try, if you create a new service, that needs to pull private images, does that new service work?
It doesn't work with any other private images as well.
But: I then tried to create a service using a public image (in this case, nginx), and this didn't work either. The error message then is:
No such image: nginx@sha256:4…
This looks quite strange, because of the @sha256:
part… Is this normal?
@goloroden that is weird, it doesn't work for any images not even public ones, so that means we need to look at a different cause.
Since you can pull directly from the same host using docker pull, that rules out a network issue.
If you see it on all of the hosts in your swarm (manager and workers) then it isn't anything related to the one host.
Do you see anything from the docker logs in /var/log/docker/
?
Can you run docker-diagnose
so we can see what is going on in the cluster? more details on how to run docker-diagnose
here: https://docs.docker.com/docker-for-aws/faqs/#where-do-i-report-problems-or-bugs
I have checked the log, and there was nothing that caught my eye, but I'm not an expert in these logs.
I've run docker-diagnose
, and the ID is 1496731860-WCt2TViUzAEn68tqYA2x5AzIow4i8rEI
@goloroden thanks, @nathanleclaire can you please take a look at the diagnose output, and see if you see anything that might be causing the issue?
@kencochrane @nathanleclaire As always, thanks for your great support 😊
I don't see anything too interesting / obvious in the debug logs.
This issue reminds me a ton of https://github.com/moby/moby/issues/8376, right down to the fact that public images which should work perfectly fine stop working.
Yes, this is true - but as said, I haven't changed my password recently.
Is there anything else I can do so that we can get closer to what's causing these issues?
Yes, I'd be surprised to find it's exactly the same, that's a quite old bug -- but I wouldn't be surprised to find out it's another bug related to distribution of registry credentials in the swarm somehow.
@goloroden Could I get you to attempt the service create
(and maybe pull) which is failing, then run docker-diagnose
immediately after? There are some repeated messages in the logs which may have pushed out useful info
Yes, of course! I'm thankful for any help 😊
I've done what you asked for. The new diagnose ID is 1496773937-kPJBEHkM42bamNbgOGP3u7z4DlSPXAEj
.
thanks, i'll take a look
ah, i think i see a likely suspect
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout:Jun 6 18:32:16 moby root: time="2017-06-06T18:32:16.378805119Z" level=error msg="Not continuing with pull after error: failed to register layer: Error processing tar file(exit status 1): open /app/node_modules/babel-helper-define-map/.npmignore: no space left on device"
might be out of disk space in most of the nodes?
hm, disk usage doesn't look too egregious though, so i'm am bit confused
/dev/xvdb1 19.7G 1.3G 17.4G 7% /var/lib/docker/overlay2
/dev/xvdb1 19.7G 958.8M 17.7G 5% /var/lib/docker/overlay2
/dev/xvdb1 19.7G 811.4M 17.9G 4% /var/lib/docker/overlay2
/dev/xvdb1 19.7G 11.1G 7.5G 60% /var/lib/docker/overlay2
/dev/xvdb1 19.7G 11.2G 7.5G 60% /var/lib/docker/overlay2
Okay … this would explain a lot, but that would also mean that the setting Automatically clean up services
(or similar … the one that regularly runs docker system prune) in the CloudFormation template does not work (or at least does not do what I expect it to do).
Unless there is another reason why the disk got filled up entirely … but I will check this out.
Is it safe to just kill the machine using AWS console? Will the Swarm cluster survive this, and start up a new one? In other words: What is the simplest way to make Swarm kill one machine and replace it by a new instance?
I also see the message you reference above:
6-06T18:32:06.921229190Z" level=error msg="fatal task error" error="No such image: thenativeweb/enterjs2017@sha256:61790b4698f0b96c0df2135d3ab7e8b184f2926b1b20710782144d4a786adb23" module="node/agent/taskmanager" task.id=15i31rs8tjge1pllv0o6z7q50
seems it's version 17.03.0-ce
correct?
Yes, 17.03.0-ce-aws1
to be exact.
@stevvooe @aaronlehmann Do you have any idea why swarmkit might attempt to pull by SHA, and be told that such a SHA doesn't exist, if the user is only specifying a tag?
@nathanleclaire: It pulls by digest so that the same version will be pulled on each node.
The daemon log will show what it tried to pull and why it didn't succeed. I'd recommend taking a look at that.
Somewhat related: https://github.com/moby/moby/issues/33521
Where do I find those daemon logs?
Thanks @aaronlehmann I'll see if I can dig up anything interesting in the surrounding logs
BTW, I see also messages like this, any idea what that might be about?
./ip-172-31-16-224-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:12 moby root: time="2017-06-06T18:32:12.133991105Z" level=warning msg="sending message to an unrecognized member ID 27e47f54051f679c" raft_id=4392a9b88600aa2b
./ip-172-31-16-224-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout:Jun 6 18:32:12 moby root: time="2017-06-06T18:32:12.134150664Z" level=error msg="could not resolve address of member ID 27e47f54051f679c" error="rpc error: code = 9 desc = grpc: the client connection is closing" raft_id=4392a9b88600aa2b
./ip-172-31-16-224-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:12 moby root: time="2017-06-06T18:32:12.135999311Z" level=debug msg="4392a9b88600aa2b [logterm: 0, index: 35400] rejected msgApp [logterm: 16, index: 35400] from 27e47f54051f679c"
./ip-172-31-16-224-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:12 moby root: time="2017-06-06T18:32:12.136150882Z" level=warning msg="sending message to an unrecognized member ID 27e47f54051f679c" raft_id=4392a9b88600aa2b
@goloroden the daemon logs are in /var/log/docker.log
Is it possible there's a very big node_modules
or something like that in the image?
How large is it uncompressed?
@aaronlehmann here are some surrounding logs. it seems to be related to the disk space issue since the No such image
seems to be a misnomer. The log clearly shows pulling thenativeweb/enterjs2017@sha256:61790b4698f0b96c0df2135d3ab7e8b184f2926b1b20710782144d4a786adb23
OK -- it's running out of disk space that's the issue
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:05 moby root: time="2017-06-06T18:32:05.024090413Z" level=debug msg="pull in progress" image="thenativeweb/enterjs2017@sha256:61790b4698f0b96c0df2135d3ab7e8b184f2926b1b20710782144d4a786adb23" status="Verifying Checksum"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:05 moby root: time="2017-06-06T18:32:05.024170203Z" level=debug msg="pull in progress" image="thenativeweb/enterjs2017@sha256:61790b4698f0b96c0df2135d3ab7e8b184f2926b1b20710782144d4a786adb23" status="Download complete"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:05 moby root: time="2017-06-06T18:32:05.076379231Z" level=debug msg="pull in progress" current=1146880 image="thenativeweb/enterjs2017@sha256:61790b4698f0b96c0df2135d3ab7e8b184f2926b1b20710782144d4a786adb23" status=Extracting total=21933931
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:05 moby root: time="2017-06-06T18:32:05.613672468Z" level=debug msg="memberlist: TCP connection from=172.31.27.204:46814"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:05 moby root: time="2017-06-06T18:32:05.614595425Z" level=debug msg="ip-172-31-7-39.eu-central-1.compute.internal-847524981b35: Initiating bulk sync for networks [sbjjvwt09dtvn1ro22yp6tngb ikhuvgep746h5l9c65wsyxv1l] with node ip-172-31-27-204.eu-central-1.compute.internal-1a89ffdbe106"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:05 moby root: time="2017-06-06T18:32:05.834535086Z" level=debug msg="pull in progress" current=5505024 image="thenativeweb/enterjs2017@sha256:61790b4698f0b96c0df2135d3ab7e8b184f2926b1b20710782144d4a786adb23" status=Extracting total=21933931
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.779238398Z" level=debug msg="Cleaning up layer 0e9f290148929587140c7493cb578fbdfe14dc94f1a7fd542d61a354b1d8dfc0: Error processing tar file(exit status 1): open /app/node_modules/babel-helper-define-map/.npmignore: no space left on device"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.799365678Z" level=error msg="Not continuing with pull after error: failed to register layer: Error processing tar file(exit status 1): open /app/node_modules/babel-helper-define-map/.npmignore: no space left on device"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.799416467Z" level=error msg="pulling image failed" error="failed to register layer: Error processing tar file(exit status 1): open /app/node_modules/babel-helper-define-map/.npmignore: no space left on device" module="node/agent/taskmanager" task.id=w3epa72doy54o99l1d3thlwkv
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.800011363Z" level=info msg="Layer sha256:fe4767e90872336f35c7321df93ef55a71dcc52f3d0facde05bb2756192e8a94 cleaned up"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout:Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.800034046Z" level=error msg="fatal task error" error="No such image: thenativeweb/enterjs2017@sha256:61790b4698f0b96c0df2135d3ab7e8b184f2926b1b20710782144d4a786adb23" module="node/agent/taskmanager" task.id=w3epa72doy54o99l1d3thlwkv
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.800079334Z" level=debug msg="state changed" module="node/agent/taskmanager" state.desired=RUNNING state.transition="PREPARING->REJECTED" task.id=w3epa72doy54o99l1d3thlwkv
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.800484344Z" level=debug msg="(*Agent).UpdateTaskStatus" module="node/agent" task.id=w3epa72doy54o99l1d3thlwkv
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.801670140Z" level=debug msg="task status reported" module="node/agent"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.809494074Z" level=debug msg="(*Agent).UpdateTaskStatus" module="node/agent" task.id=w3epa72doy54o99l1d3thlwkv
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:06 moby root: time="2017-06-06T18:32:06.810509354Z" level=debug msg="task status reported" module="node/agent"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:07 moby root: time="2017-06-06T18:32:07.059294250Z" level=debug msg="(*worker).Update" len(assignments)=2 module="node/agent"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:07 moby root: time="2017-06-06T18:32:07.059370900Z" level=debug msg="(*worker).reconcileSecrets" len(removedSecrets)=0 len(updatedSecrets)=0 module="node/agent"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:07 moby root: time="2017-06-06T18:32:07.059401988Z" level=debug msg="(*worker).reconcileTaskState" len(removedTasks)=0 len(updatedTasks)=2 module="node/agent"
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:07 moby root: time="2017-06-06T18:32:07.059431968Z" level=debug msg=assigned module="node/agent" task.desiredstate=SHUTDOWN task.id=w3epa72doy54o99l1d3thlwkv
./ip-172-31-7-39-eu-central-1-compute-internal/tail -20000 /var/log/messages.stdout-Jun 6 18:32:07 moby root: time="2017-06-06T18:32:07.059521213Z" level=debug msg=assigned module="node/agent" task.desiredstate=READY task.id=rzi6oeaeaiurcvgshcscdlbjg
something to consider: it looks like according to the log that docker creates a tmp file for image layer pulls? so it's possible that there might be some duplication of a layer that pushes the disk limit over the edge, but the attempted layer download gets cleaned up later? just making some guesses why we're seeing this behavior even if the layer with node_modules
isn't that big. i'd be curious to see docker history
on this image that you are trying to pull @goloroden
time="2017-06-06T18:32:04.216188966Z" level=debug msg="Downloaded 74505baa8510 to tempfile /var/lib/docker/tmp/GetImageBlob423471451"
Could totally be wrong and there's another auth-related issue too though.
The temporary file is only kept during the pull process, however there's a PR that would change that: https://github.com/moby/moby/pull/28348
It's 209 MByte uncompressed.
Regarding docker history
of this image, here we go:
IMAGE CREATED CREATED BY SIZE COMMENT
a9356c0b1010 2 days ago /bin/sh -c #(nop) CMD ["node" "/app/app.js"] 0 B
<missing> 2 days ago /bin/sh -c #(nop) ADD dir:f6a30e9743073d06... 26.4 MB
<missing> 2 days ago /bin/sh -c cd /app && npm install --pr... 68.5 MB
<missing> 2 days ago /bin/sh -c #(nop) ADD file:eacc297a6875503... 1.4 kB
<missing> 2 days ago /bin/sh -c #(nop) MAINTAINER the native w... 0 B
<missing> 8 months ago /bin/sh -c apk add --no-cache curl make gc... 44.6 MB
<missing> 8 months ago /bin/sh -c #(nop) ENV VERSION=v6.6.0 NPM_... 0 B
<missing> 11 months ago /bin/sh -c #(nop) ADD file:852e9d0cb9d9065... 4.8 MB
(Is it important on which machine I run this command?)
FYI, I have SSHed into the node 172.31.7.39
, and ran df -h
. Here's the result:
Filesystem Size Used Available Use% Mounted on
overlay 19.7G 11.1G 7.5G 60% /
tmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
shm 64.0M 0 64.0M 0% /dev/shm
/dev/xvdb1 19.7G 11.1G 7.5G 60% /var/log
/dev/xvdb1 19.7G 11.1G 7.5G 60% /etc/ssh
tmpfs 1.9G 153.0M 1.8G 8% /etc/passwd
tmpfs 1.9G 153.0M 1.8G 8% /etc/group
tmpfs 1.9G 153.0M 1.8G 8% /home/docker
/dev/xvdb1 19.7G 11.1G 7.5G 60% /etc/hosts
tmpfs 1.9G 153.0M 1.8G 8% /etc/shadow
/dev/xvdb1 19.7G 11.1G 7.5G 60% /etc/hostname
/dev/xvdb1 19.7G 11.1G 7.5G 60% /etc/resolv.conf
tmpfs 394.6M 708.0K 393.9M 0% /var/run/docker.sock
tmpfs 1.9G 153.0M 1.8G 8% /usr/bin/docker
/dev/xvdb1 19.7G 11.1G 7.5G 60% /var/lib/docker/swarm/lb_name
/dev/xvdb1 19.7G 11.1G 7.5G 60% /var/lib/docker/swarm/elb.config
tmpfs 1.9G 0 1.9G 0% /proc/kcore
tmpfs 1.9G 0 1.9G 0% /proc/timer_list
tmpfs 1.9G 0 1.9G 0% /proc/sched_debug
tmpfs 1.9G 0 1.9G 0% /sys/firmware
Am I missing something, or is there no disk where no space is left? If so, why do the logs say so?
Can you try df -i
? The filesystem could be running out of inodes. overlay in particular is very inode-intensive.
That's it!!!
Here is the result of df -i
:
Filesystem Inodes Used Available Use% Mounted on
overlay 1305600 1304994 606 100% /
tmpfs 505092 16 505076 0% /dev
tmpfs 505092 15 505077 0% /sys/fs/cgroup
shm 505092 1 505091 0% /dev/shm
/dev/xvdb1 1305600 1304994 606 100% /var/log
/dev/xvdb1 1305600 1304994 606 100% /etc/ssh
tmpfs 505092 1873 503219 0% /etc/passwd
tmpfs 505092 1873 503219 0% /etc/group
tmpfs 505092 1873 503219 0% /home/docker
/dev/xvdb1 1305600 1304994 606 100% /etc/hosts
tmpfs 505092 1873 503219 0% /etc/shadow
/dev/xvdb1 1305600 1304994 606 100% /etc/hostname
/dev/xvdb1 1305600 1304994 606 100% /etc/resolv.conf
tmpfs 505092 252 504840 0% /var/run/docker.sock
tmpfs 505092 1873 503219 0% /usr/bin/docker
/dev/xvdb1 1305600 1304994 606 100% /var/lib/docker/swarm/lb_name
/dev/xvdb1 1305600 1304994 606 100% /var/lib/docker/swarm/elb.config
tmpfs 505092 16 505076 0% /proc/kcore
tmpfs 505092 16 505076 0% /proc/timer_list
tmpfs 505092 16 505076 0% /proc/sched_debug
tmpfs 505092 1 505091 0% /sys/firmware
As we can easily see, there are several lines where it says 100%
used. So I guess, that this causes the issues, right?
Glad you were finally able to figure it out. Now you have two options. Cleanup some of the inodes, or expand the disk to give you more inodes. The easier is to find what is using so many inodes and clean them up, if you can. It is sometimes a bunch of files in /tmp.
Try this command to see if you can find where your inodes are located.
$ sudo find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
also, try a docker system prune
to see if you can remove some of the docker items you no longer need.
docker system prune
didn't do anything (I would have been surprised if, as I set this to be run automatically each day in the setup of Docker for AWS).
I actually just tried to run the very same command that you've just suggested, but the result does not look suspicious:
/ $ sudo find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
1 .dockerenv
1 entry.sh
1 run
3 bin
3 sbin
3 var
8 lib
61 etc
2887 usr
Again, am I missing something?
@goloroden ok, I'm guessing you are running this command from ssh, which is actually inside of a docker container. You will need to run from the host, so you can see the host file system.
Try running this command, first, and then the inode one from above.
docker run -it --privileged --pid=host debian nsenter -t 1 -m -n sh
Sorry, if that doesn't work, try this one.
docker run --rm -it --privileged --pid=host justincormack/nsenter1 /bin/ash
Ah, of course! Sorry for the dumb question…
I think that I ran this from the host, not from a container: First I SSH from my machine to the bastion host (this is a Docker container running sshd
), then I SSH from there to the worker. Since these machines were setup by Docker for AWS, I don't know whether this takes me to a container or to the actual host.
Anyway, if I run the first command, it doesn't work (no space left on device
… 😉).
But the second command does. If I then run df -i
, I get the following output:
/ # find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
1 .ash_history
1 init
2 home
4 bin
13 containers
25 sbin
105 lib
194 etc
582 usr
Again, this does not look too bad, does it?
If I run df -i
from there, it outputs:
Filesystem Inodes Used Available Use% Mounted on
tmpfs 505092 1874 503218 0% /
tmpfs 505092 269 504823 0% /run
cgroup_root 505092 15 505077 0% /sys/fs/cgroup
dev 497695 173 497522 0% /dev
shm 505092 1 505091 0% /dev/shm
/dev/xvdb1 1305600 1305532 68 100% /var
tmpfs 505092 16 505076 0% /tmp
tmpfs 505092 6 505086 0% /Database
/dev/xvdb1 1305600 1305532 68 100% /var/lib/docker/overlay2
overlay 1305600 1305532 68 100% /var/lib/docker/overlay2/99d0fef07c67b82342b1ca79b168b4373f028c2cae907405dbbe5df7e1afcceb/merged
shm 505092 1 505091 0% /var/lib/docker/containers/2aa3b6489f58c9114c9ef24b9b85ac113bbba515099cb2195ab70334e541c701/shm
overlay 1305600 1305532 68 100% /var/lib/docker/overlay2/2c26aa87fd987e5f9bcae375ce126eb7012e7b9f86f24ac6639715a84731f549/merged
shm 505092 1 505091 0% /var/lib/docker/containers/05d815022f3cec7219fcd10b4071a83d982f7078ecb329644fc79d38f6177a02/shm
overlay 1305600 1305532 68 100% /var/lib/docker/overlay2/b83ae2b3126d639ce0c2e3ae65995179eaa6721df957972cc286f9b57f6dbe70/merged
shm 505092 1 505091 0% /var/lib/docker/containers/44e0a79e6c4d0a1527c0a9e6a521571ccd56a4604ebe752d126ed7882385dc22/shm
overlay 1305600 1305532 68 100% /var/lib/docker/overlay2/4b351b79da692230e9031fb782546d94631cf2649c846bb9659580adb91152b7/merged
shm 505092 1 505091 0% /var/lib/docker/containers/e5ea389b0f1e4b2e6da23acee74ab3d08fc6db2f5c7e3dbd46ab41e7e6733a33/shm
overlay 1305600 1305532 68 100% /var/lib/docker/overlay2/a1818ea07aed3ae0cd898db749e58d827ac91e071cca926940bfd755b0d73f46/merged
shm 505092 1 505091 0% /var/lib/docker/containers/500dcf7de3903f4ed2696575d8ba552ccd9a661dd24075080b39aa321a8df958/shm
overlay 1305600 1305532 68 100% /var/lib/docker/overlay2/94cf2403454a33b93f84afbd4e10ecef9fb53d6ed24ef097c31b2abf94c9ce30/merged
shm 505092 1 505091 0% /var/lib/docker/containers/42b971cd616ee25d050ee5298bbf6d7b0130c648a6baa8178d9b7d983fdacf30/shm
overlay 1305600 1305532 68 100% /var/lib/docker/overlay2/890a9778bbc346cbff1288d515a972660d70d1c326cac446750a5965d3bbd2ab/merged
shm 505092 1 505091 0% /var/lib/docker/containers/51cac1dc0e9ed5024caf873b6220a32d11e711aa1969e040294f97cfb8e2ca2c/shm
Does this help?
It is definitely in the /var directory. Do your images have a ton of files in them? How about /var/log is there a lot of logs in there? we need to find out where all of the files are coming from.
Regarding your questions:
node_modules
folder it's not just 10 files, or so … the largest image is built from a directory which has 23455 files)./var/log
contains 108 files, is 88.5 MByte in size, and the largest log is 24 MByte.I then had a look at /var/lib
:
/var/lib # du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
1365437 docker
2 misc
2 dhcpcd
1 12169968 .
So, docker
seems to be worth the next look:
/var/lib/docker # du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
1362748 overlay2
2573 image
75 containers
15 volumes
11 swarm
7 network
5 plugins
1 trust
1 tmp
1 12169948 .
The overlay2
folder contains 322 folders. The largest of them has 23305 files.
This caught my eye: 23305 is extremely close to 23455. If I take into consideration that some files are not put into the container (the .git
folder, e.g.), this pretty much looks as if this belongs together. Maybe this is also just coincidence (I'm just guessing here), but it's interesting.
The list then continues like this:
/var/lib/docker/overlay2 # du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
23305 94cf2403454a33b93f84afbd4e10ecef9fb53d6ed24ef097c31b2abf94c9ce30
20108 a1818ea07aed3ae0cd898db749e58d827ac91e071cca926940bfd755b0d73f46
19176 5b92e30239e19e013e9615ae5fce2da966dfc502ced037dad64996f579dc872c
19091 75a8c4702d2d5fbc5144d0ba5d1171782960b0e6317ca80c2898b1a8baf0baf8
19070 c281d8cc0879e738b051e49981578ae402d6c942148a2e64944fc8473f61b09b
19070 57b08e89ee8b776e90736111a4d80f09822634846918d70bde9f8cc821fd1ea9
19070 23a43cc4592bccd9e126a6a81df23ac6743940437e132ce0b4d00c4b42904997
19062 0161ac14d4e689d0a2335b91b9bef96277f0f0ac065d1775a18fb901bab832ba
19049 a7182adbb1cc8a0487e5a8193daea9d90097da5ef370b471b9f0a9858bc42505
19040 62a61c99f522b021d295ba04776ff2106f2582d00471f4cad785e148ce55b89f
19025 e05dabe52feb945fdf2edf6e114ea2ce9d063d55c34402a68b605e7bff714e7a
19025 8e30133bc4e832690493d88e3b7c29ff82697c9702c12d576514dea4a8af8de6
19007 1b9117fa629a3b8ef4cef5f0070210247617f72c1b4fcf1eee77a8c64efc2a69
16440 8ba87ebd88b99bbe93331b68c669e0e80ea82f0446b1f88dd64384dbe5a57fed
16438 faf56d45f82e2edfbf4ca63c4d201944819815b8fc8f3d2cc2585fe12ef9fe96
16438 ea371501ccfff82d468b575803fd03f651a47681d92b046ba18f882a8d5094bc
16438 e4a95f5d062f50cca38cb3d30ac37082e35fed07b37fd145056850dd08dfcf39
16438 d8d61a7989c7d918b9471415852d44ae22bb5b5b56d5e8f91ac607514484f174
16438 cb9ce5adef52e107704b3b42351ff9e06db366a8d80a33025e398ab02c0d95df
[…]
Again, it's interesting to see lots of entries with the exact same number of files. This made me think: If I update an image by editing a file, the number of files doesn't change. Can it be that these directories contain the contents of each single version of each single image, and - for whatever reason - they never get cleaned up?
what do you get when you run docker images
and docker images --all
?
If you have one docker image that is based off of another, it will add a new layer for any changes, and that is just a diff (only something that changed between the two images). So each layer shouldn't have duplicate files, unless you did something to every files (change permissions, etc)
I am running Docker for AWS 1.13 for a few months now, and after some initial problems it has now worked perfectly for quite some time. That is, until… ;-)
A few days ago I had a strange behavior: I stopped a container and tried to start a new version of it. The image for the container is in a private repository on Docker Hub. Anyway, I was not able to start it as a service, because Docker now tells me that it can't find the image, because the image doesn't exist.
But: The image actually exists, and I can pull it without any problems e.g. from my local machine.
I have now figured out that this applies to all containers running, I can not start a new version of any of them.
Any idea what's causing this behavior, and how to fix it?