Closed cc250080 closed 7 years ago
I wonder, can this be because I don't enable CloudWatch? Since I am using ELK I thought CloudWatch would be redundant. Any idea?
@cc250080 thanks for the report. I don't think it would be related to cloudwatch, but you never know, it depends on how many logs your containers produce.
Can you run the following commands, and post the results:
This will show the disk usage for the host vs shell container.
docker run -v /:/hostroot alpine:3.6 /bin/sh -c "du -shc /hostroot/*"
This will tell us what docker items are taking up space.
docker system df
If you want more info you can use the verbose flag, but it might give info you don't want to share, so feel free to not post those results.
docker system df --verbose
@cc250080 Can you share the stack that you deploy when these get full?
Dear @kencochrane and @FrenchBen ,
Thank you very much for giving me a hand. Unfortunately I am still with the same problem.
The results of the commands that @kencochrane suggested:
From the Swarm Leader:
<~ # docker run -v /:/hostroot alpine:3.6 /bin/sh -c "du -shc /hostroot/*
Unable to find image 'alpine:3.6' locally
3.6: Pulling from library/alpine
88286f41530e: Downloading [==================================================>] 1.99MB/1.99MB
docker: write /var/lib/docker/tmp/GetImageBlob316476840: no space left on device.
See 'docker run --help'.
From a Swarm Manager:
~/docker # docker run -v /:/hostroot alpine:3.6 /bin/sh -c "du -shc /hostroot/*"
Unable to find image 'alpine:3.6' locally
3.6: Pulling from library/alpine
88286f41530e: Already exists
Digest: sha256:1072e499f3f655a032e88542330cf75b02e7bdf673278f701d7ba61629ee3ebe
Status: Downloaded newer image for alpine:3.6
16.0K /hostroot/Database
872.0K /hostroot/bin
11.0M /hostroot/containers
0 /hostroot/dev
0 /hostroot/dockerimages
1.7M /hostroot/etc
910.7M /hostroot/home
4.0K /hostroot/init
5.3M /hostroot/lib
0 /hostroot/media
0 /hostroot/proc
0 /hostroot/root
1.1M /hostroot/run
11.4M /hostroot/sbin
0 /hostroot/srv
0 /hostroot/sys
8.0K /hostroot/tmp
130.4M /hostroot/usr
2.9G /hostroot/var
3.9G total
-Not sure how to interpret this results, where are the 80Gb that are making the drive full?
~ # docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 12 11 2.021GB 756.3MB (37%)
Containers 11 11 104.4kB 0B (0%)
Local Volumes 2 2 562.4kB 0B (0%)
FROM THE LEADER:
~ # docker system df --verbose
Images space usage:
REPOSITORY TAG IMAGE ID CREATED ago SIZE SHARED SIZE UNIQUE SiZE CONTAINERS
Couple more outputs, can you run the following from your home directory:
du -d 1 -h /
Here is the output:
~ # du -d 1 -h / 0 /sys 31.5M /usr 1.8M /etc 0 /proc 12.0K /home 216.0K /sbin 4.0K /tmp 8.0K /run 8.0K /root 1.4M /bin 56.9M /var 4.0K /mnt 16.0K /media 2.8M /lib 0 /dev 4.0K /srv 94.7M /
Thanks !
Still, 'df -h'
~ # df -h Filesystem Size Used Available Use% Mounted on overlay 78.7G 78.7G 0 100% / tmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup tmpfs 7.8G 161.2M 7.7G 2% /home/docker /dev/xvdb1 78.7G 78.7G 0 100% /etc/ssh /dev/xvdb1 78.7G 78.7G 0 100% /var/log tmpfs 7.8G 161.2M 7.7G 2% /etc/group tmpfs 7.8G 161.2M 7.7G 2% /etc/passwd tmpfs 7.8G 161.2M 7.7G 2% /etc/shadow /dev/xvdb1 78.7G 78.7G 0 100% /etc/resolv.conf /dev/xvdb1 78.7G 78.7G 0 100% /etc/hostname /dev/xvdb1 78.7G 78.7G 0 100% /etc/hosts shm 64.0M 0 64.0M 0% /dev/shm tmpfs 1.6G 1.0M 1.6G 0% /var/run/docker.sock tmpfs 7.8G 161.2M 7.7G 2% /usr/bin/docker tmpfs 7.8G 0 7.8G 0% /proc/kcore tmpfs 7.8G 0 7.8G 0% /proc/timer_list tmpfs 7.8G 0 7.8G 0% /proc/sched_debug tmpfs 7.8G 0 7.8G 0% /sys/firmware
Thank you very much @FrenchBen Carles
@cc250080 Thanks for the output - You wouldn't happen to be in our community? https://dockr.ly/community
May be quicker to look at a few things with you.
@FrenchBen I just did the sign-up, thanks also for letting me know.
In which channel I should find you or discuss topics related with Swarm and Swarm for AWS issues?
I am in #general as carles6
Went through a 1-1 with @cc250080 we determined that his logstash setup was at fault, as the container wasn't huge in size, but the log file for it was:
-rw-r----- 1 root root 77.1G Aug 16 07:37 e52b3f03d2a4b875b1d9d75e5f654c45ae929daa82a54c33e267cd7fb775fc71-json.log
Essentially the logging seems to be setup to log to disk (CloudWatch disabled), causing a large log file to be created for the container.
Expected behavior
Daily cleanup job will avoid this from happen (it is activated)
Actual behavior
Second time my Swarm leader gets it's HD full (80Gb)
/ # df -h Filesystem Size Used Available Use% Mounted on overlay 78.7G 78.0G 0 100% / tmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup tmpfs 7.8G 161.2M 7.7G 2% /etc/shadow /dev/xvdb1 78.7G 78.0G 0 100% /etc/ssh tmpfs 7.8G 161.2M 7.7G 2% /home/docker /dev/xvdb1 78.7G 78.0G 0 100% /var/log tmpfs 7.8G 161.2M 7.7G 2% /etc/group tmpfs 7.8G 161.2M 7.7G 2% /etc/passwd /dev/xvdb1 78.7G 78.0G 0 100% /etc/resolv.conf /dev/xvdb1 78.7G 78.0G 0 100% /etc/hostname /dev/xvdb1 78.7G 78.0G 0 100% /etc/hosts shm 64.0M 0 64.0M 0% /dev/shm tmpfs 7.8G 161.2M 7.7G 2% /usr/bin/docker tmpfs 1.6G 964.0K 1.6G 0% /var/run/docker.sock tmpfs 7.8G 0 7.8G 0% /proc/kcore tmpfs 7.8G 0 7.8G 0% /proc/timer_list tmpfs 7.8G 0 7.8G 0% /proc/sched_debug tmpfs 7.8G 0 7.8G 0% /sys/firmware
Information
I am using Docker CE for AWS 17.06.0-ce (17.06.0-ce-aws2)
Steps to reproduce the behavior
Just wait a pair of weeks, actually, is there any way from inside the SSH access container to reach /var/lib/docker?
Actually I don't really know what is making the hard disk of the leader full every time, pruning containers and images doesn't help, /var/lib/logs is pretty empty.
Since it is the second time that this happens, in different versions, I am starting to get pretty worried by this issue.