burmilla / os

Tiny Linux distro that runs the entire OS as Docker containers
https://burmillaos.org
Apache License 2.0
210 stars 13 forks source link

overlay volume running near out-of-space limit #182

Closed alesnav closed 1 month ago

alesnav commented 1 month ago

Issue description I have a 3-node Docker swarm cluster by using BurmillaOS in each of these nodes.

$ docker node ls
ID                            HOSTNAME    STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
u8xw9leztz9hobtnntnb38xeo *   rancher01   Ready     Active         Leader           26.0.2
durtppr5fy22e6vphaxnh542o     rancher02   Ready     Active         Reachable        26.0.2
lf9lj6n4rv8c8o9kl1my8hi19     rancher03   Ready     Active         Reachable        26.0.2

I realized that the system storage is signifcantly different between each of them, even keeping in mind that all three nodes have the same ros os and ros engine version running.

rancher01:

rancher@rancher01:~$ sudo ros os version
v2.0.1
rancher@rancher01:~$ sudo ros engine list | grep current
current  docker-26.0.2
rancher@rancher01:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
overlay                38G   18G   19G  49% /
devtmpfs              5.8G     0  5.8G   0% /dev
tmpfs                 5.9G     0  5.9G   0% /sys/fs/cgroup
/dev/sda1              38G   18G   19G  49% /opt
none                  5.9G  3.2M  5.9G   1% /run
shm                    64M     0   64M   0% /dev/shm
192.168.2.200:/datos  3.0T  2.6T  475G  85% /nas

rancher02:

rancher@rancher02:~$ sudo ros os version
v2.0.1
rancher@rancher02:~$ sudo ros engine list | grep current
current  docker-26.0.2
rancher@rancher02:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
overlay                38G   34G  2.2G  94% /
devtmpfs              5.8G     0  5.8G   0% /dev
tmpfs                 5.9G     0  5.9G   0% /sys/fs/cgroup
/dev/sda1              38G   34G  2.2G  94% /mnt
none                  5.9G  3.5M  5.9G   1% /run
shm                    64M     0   64M   0% /dev/shm
192.168.2.200:/datos  3.0T  2.6T  475G  85% /nas

rancher03:

rancher@rancher03:~$ sudo ros os version
v2.0.1
rancher@rancher03:~$ sudo ros engine list | grep current
current  docker-26.0.2
rancher@rancher03:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
overlay                38G   29G  7.0G  81% /
devtmpfs              5.8G     0  5.8G   0% /dev
tmpfs                 5.9G     0  5.9G   0% /sys/fs/cgroup
/dev/sda1              38G   29G  7.0G  81% /mnt
none                  5.9G  2.6M  5.9G   1% /run
shm                    64M     0   64M   0% /dev/shm
192.168.2.200:/datos  3.0T  2.6T  475G  85% /nas

As you can see, first node is using 49% of overlay, second node is using 94% and the last one is at 81%. All of them are in cluster, using the same versions and everything should be similar. Also, yesterday, rancher03 node was 100% full in overlay, but after a reboot it is showing 81% right now (no idea about the reason).

How can I check if this is expected or not and how can I (force) free up space?

BurmillaOS Version: (ros os version) 1.0.2

Where are you running BurmillaOS? (docker-machine, AWS, GCE, baremetal, etc.) ESXi 6.5.0

Which processor architecture you are using? x64

Do you use some extra hardware? (GPU, etc)? No

Which console you use (default, ubuntu, centos, etc..) Default

Do you use some service(s) which are not enabled by default? Yes, open-vm-tools and volume-nfs

Have you installed some extra tools to console? No

Do you use some other customizations? No

Please share copy of your cloud-init (remember remove all sensitive data first)

hostname: rancher01
mounts:
- - 192.168.2.200:/datos
  - /nas
  - nfs
  - ""
rancher:
  console: default
  docker:
    engine: docker-26.0.2
  environment:
    EXTRA_CMDLINE: /init
  modules:
  - ip_vs
  network:
    dns:
      nameservers:
      - 192.168.2.1
      search:
      - <LOCAL_DOMAIN_NAME>
    interfaces:
      eth0:
        address: 10.100.10.25/24
        dhcp: false
        gateway: 10.100.10.254
        mtu: 1450
  services_include:
    open-vm-tools: true
    volume-nfs: true
  state:
    dev: LABEL=RANCHER_STATE
    wait: true
  sysctl:
    net.ipv4.ip_nonlocal_bind: 1
    net.ipv6.ip_nonlocal_bind: 1
    vm.max_map_count: 262144
  upgrade:
    url: https://raw.githubusercontent.com/burmilla/releases/v2.0.x/releases.yml
ssh_authorized_keys:
- ssh-rsa <PRIVATE_KEY>
- ecdsa-sha2-nistp521 <PRIVATE_KEY>
olljanat commented 1 month ago

Normally it is containers, images and build cache in Docker which uses most of the disk space. You can check that with command du -sh /var/lib/docker and release disk space with standard Docker cleanup command docker system prune

alesnav commented 1 month ago

docker system prune is a good option to fix this. Thanks!