no space left on device

elhigu / gitlab-ci-dind-with-image-cache

Example setups for creating docker in docker setup with cached images, without need to use docker save / load or local image registry

2 stars 2 forks source link

no space left on device #1

Open ghost opened 5 years ago

ghost commented 5 years ago

Hello, I tried your script to download hyperledger/fabric-ccenv:1.4.1 image which is around 1.5GB, but i encountered error as shown

+ FINAL_IMAGE_NAME=fabric-dind:1.4.1
+ shift
+ TEMP_IMAGE_NAME=custom-dind
+ TEMP_CONTAINER_NAME=temp
+ docker build -t custom-dind .
Sending build context to Docker daemon  7.168kB
Step 1/5 : FROM docker:dind
 ---> e4157102c815
Step 2/5 : COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
 ---> 6585d6dc5142
Step 3/5 : RUN chmod gou+x /usr/local/bin/dockerd-entrypoint.sh
 ---> Running in 6cb2fa80eb08
Removing intermediate container 6cb2fa80eb08
 ---> e2114181a9ca
Step 4/5 : RUN mkdir -p /var-lib-docker
 ---> Running in edc75b97599b
Removing intermediate container edc75b97599b
 ---> 15b8bc0ffd7b
Step 5/5 : CMD ["--storage-driver=overlay2"]
 ---> Running in fa8d3f996923
Removing intermediate container fa8d3f996923
 ---> 7e2b7257f21f
Successfully built 7e2b7257f21f
Successfully tagged custom-dind:latest
+ docker run --detach --privileged --name temp custom-dind
0cfad863d78a8cdd7b78d478d90513c95aff1a62bc045dd0c206f1174c418f96
+ docker exec temp docker pull hyperledger/fabric-ccenv:1.4.1
1.4.1: Pulling from hyperledger/fabric-ccenv
..
Digest: sha256:bb929eef560b50e0fbd730c6b195e49fece28dd4612ec30db0ce2cc096483463
Status: Downloaded newer image for hyperledger/fabric-ccenv:1.4.1
+ docker exec temp docker tag hyperledger/fabric-ccenv:1.4.1 hyperledger/fabric-ccenv:latest
++ docker exec temp df -m /var-lib-docker
++ grep /var-lib-docker
++ awk '{print $3}'
+ USED_MB=1524
++ expr 1524 + 2048
+ TRIM_TO_MB=3572
++ expr 3572 / 1024
+ TRIM_TO_GB=3
+ echo 'Resizing ext4 to 3GB'
Resizing ext4 to 3GB
+ docker exec temp sh -c 'echo 3 > /trim-ext4-on-next-start.txt'
+ docker stop temp
temp
+ docker start temp
temp
+ docker exec temp rm -fr /var-lib-docker/runtime
+ docker commit temp fabric-dind:1.4.1
Error response from daemon: Error processing tar file(exit status 1): write /var-lib-docker.loopback.ext4: no space left on device
+ docker stop temp
temp
+ docker rm temp
temp

elhigu commented 5 years ago

Sounds like your docker host machine did ran out of space if command docker commit fails because of not having enough space.

Could you check if the machine where you are building that docker-in-docker image with prepulled images really has enough disk space? It will need lot more than 1.5GB there.

ghost commented 5 years ago

@elhigu yes i still has hundreds of GB of space. I tried in a colleague's machine (which has lots of capacity) and the same error occurs. I'm using Docker for Mac

I tried to pull alpine:3.10.1 instead just to try it out. It succeeds

Status: Downloaded newer image for alpine:3.10.1
docker.io/library/alpine:3.10.1
++ docker exec temp df -m /var-lib-docker
++ grep /var-lib-docker
++ awk '{print $3}'
+ USED_MB=58
++ expr 58 + 2048
+ TRIM_TO_MB=2106
++ expr 2106 / 1024
+ TRIM_TO_GB=2
+ echo 'Resizing ext4 to 2GB'
Resizing ext4 to 2GB
+ docker exec temp sh -c 'echo 2 > /trim-ext4-on-next-start.txt'
+ docker stop temp
temp
+ docker start temp
temp
+ docker exec temp rm -fr /var-lib-docker/runtime
+ docker commit temp aldredb/fabric-dind:1.4.1
sha256:667c9c1060b07f061bcd982e19aa632ae180f96ffb7acaf1ea9fb4c56cbeeb65
+ docker stop temp
temp
+ docker rm temp
temp

But it is a bit weird that the resulting image size is 53.9GB

➜  gitlab-custom-dind git:(master) ✗ docker images
REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
aldredb/fabric-dind   1.4.1               2400bea9f8b9        8 minutes ago       53.9GB
custom-dind           latest              c8c6e315ce42        16 minutes ago      230MB
docker                dind                6ce0d31cf4d6        3 hours ago         230MB

elhigu commented 5 years ago

Looks like trimming down disk image after startup didn't ran correctly. I actually have more uptodate trimming code with more debug info here somewhere. I could check that out if I have done some fixing there.

System works in a way that it genreated 60GB sparse file in docker container and installs everything there. Then it tries to trim it down to smaller size if magic file is found during container startup. Trimming down is necessary because docker's layer filesystem doesn't support gzipping sparse files, which causes container size to explode when it is committed to new image.

They actually added one feature (after my feature request) to busybox which would now allow to make small image file during startup and which then could be resized without restart when system notices that it is about to fill. I'll check if latest docker:dind images are already using it and if so I'll fix that resizing method to grow that ext4 file instead of trimming down.

elhigu commented 5 years ago

You could also get more debug info why trimming didn't work by checking docker logs after restarting prefilled container

+ docker start temp
temp
// ---- check docker logs here!
+ docker exec temp rm -fr /var-lib-docker/runtime

ghost commented 5 years ago

These are the logs of temp

...
time="2019-07-24T02:56:38.815294200Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
time="2019-07-24T02:56:38.815818200Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc0007596d0, TRANSIENT_FAILURE" module=grpc
time="2019-07-24T02:56:38.815861800Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc0007596d0, CONNECTING" module=grpc
e2fsck 1.45.2 (27-May-2019)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/var-lib-docker.loopback.ext4: 561/3276800 files (0.2% non-contiguous), 286162/13107200 blocks
resize2fs 1.45.2 (27-May-2019)
Resizing the filesystem on /var-lib-docker.loopback.ext4 to 524288 (4k) blocks.

elhigu commented 5 years ago

there should be more logs... after resizing FS it should truncate the file and check that it is ok.

this is my most recent version of

dockerd-entrypoint.sh

#!/bin/sh
set -e

# this is pretty much instantaneus becuase while container 
# is not committed to image sparse files works just fine
if [ ! -e /var-lib-docker.loopback.ext4 ]; then
  dd of=/var-lib-docker.loopback.ext4 bs=1 seek=50G count=0
  /sbin/mkfs.ext4 -q /var-lib-docker.loopback.ext4
fi

# TODO: create scripts to autoresize partition when docker:dind
#       is released, which has this bugfix included 
#       https://bugs.busybox.net/show_bug.cgi?id=11886

# trim ext4 image file to smaller length if special file is found
if [ -e /trim-ext4-on-next-start.txt ]; then
  export TRIM_GIGABYTES=$(cat /trim-ext4-on-next-start.txt)
  set -x
  fsck.ext4 -y -f /var-lib-docker.loopback.ext4
  resize2fs /var-lib-docker.loopback.ext4 ${TRIM_GIGABYTES}G
  truncate -s ${TRIM_GIGABYTES}G /var-lib-docker.loopback.ext4
  rm -f /trim-ext4-on-next-start.txt
  fsck.ext4 -y /var-lib-docker.loopback.ext4
  set +x
fi  

# if host docker is not running btrfs file system, this will have to copy the whole
# readonly var-lib-docker.loopback.ext4 file to running container... which will use lots
# of space and takes minutes for for example 10GB of data... so to make this fast use 
# btrfs (you can even run it in virtual machine and it will be fast...)
mount -t ext4 -o loop /var-lib-docker.loopback.ext4 /var-lib-docker

# no arguments passed
# or first arg is `-f` or `--some-option`
if [ "$#" -eq 0 ] || [ "${1#-}" != "$1" ]; then
  # add our default arguments
  set -- dockerd \
    --data-root=/var-lib-docker \
    --host=unix:///var/run/docker.sock \
    --host=tcp://0.0.0.0:2375 \
    "$@"
fi

if [ "$1" = 'dockerd' ]; then
  if [ -x '/usr/local/bin/dind' ]; then
    # if we have the (mostly defunct now) Docker-in-Docker wrapper script, use it
    set -- '/usr/local/bin/dind' "$@"
  fi

  # explicitly remove Docker's default PID file to ensure that it can start properly if it was stopped uncleanly (and thus didn't clean up the PID file)
  find /run /var/run -iname 'docker*.pid' -delete
fi

exec "$@"

ghost commented 5 years ago

@elhigu that's all the logs. Tried your latest dockerd-entrypoint.sh, same result

elhigu commented 5 years ago

Right... sounds like container is stopped before resizing is complete... In my current project I have a bit different script since I actually load images from gzipped dump instead of registry, but you can see the the waiting parts there.

I'll update also this repo when I get some time to it.

$ cat ../../../.gitlab-caching-dind/create-dind-with-images.sh
#!/bin/sh

set -x

FINAL_IMAGE_NAME=$1
IMAGE_LIST_FILE=$2
UNIQUE_POSTFIX=$3

TEMP_IMAGE_NAME=custom-dind-$UNIQUE_POSTFIX
TEMP_CONTAINER_NAME=fill-images-$UNIQUE_POSTFIX

# create container where to pull images
docker build -t $TEMP_IMAGE_NAME .
docker run --detach --privileged --name $TEMP_CONTAINER_NAME $TEMP_IMAGE_NAME

# wait for container to be fully started up
sleep 5

# load images
for image in $(cat $IMAGE_LIST_FILE); do
  gunzip -c $image | docker exec -i $TEMP_CONTAINER_NAME docker load
  echo "Done: $image"
done

# find out used disk size and add 2-3GB extra (resize may fail with just +1GB)
USED_MB=$(docker exec $TEMP_CONTAINER_NAME df -m /var-lib-docker | grep /var-lib-docker | awk '{print $3}')
TRIM_TO_MB=$(expr $USED_MB + 3072)
TRIM_TO_GB=$(expr $TRIM_TO_MB / 1024)
echo "Resizing ext4 to ${TRIM_TO_GB}GB"

docker exec $TEMP_CONTAINER_NAME sh -c "echo $TRIM_TO_GB > /trim-ext4-on-next-start.txt"
docker exec $TEMP_CONTAINER_NAME df -h
docker exec $TEMP_CONTAINER_NAME ls -la  /

docker stop $TEMP_CONTAINER_NAME
docker start $TEMP_CONTAINER_NAME

# shouldnt be needed... but just in case
until(docker exec $TEMP_CONTAINER_NAME echo 'wait start'); do
  sleep 3;
done;

# wait for resize to be ready
while(docker exec $TEMP_CONTAINER_NAME ls -la /trim-ext4-on-next-start.txt); do
  sleep 3;
done;

docker logs $TEMP_CONTAINER_NAME
docker exec $TEMP_CONTAINER_NAME df -h
docker exec $TEMP_CONTAINER_NAME ls -la  /

docker exec $TEMP_CONTAINER_NAME rm -fr /var-lib-docker/runtimes
docker exec $TEMP_CONTAINER_NAME sh -c 'rm -fr /run/*'

docker commit $TEMP_CONTAINER_NAME $FINAL_IMAGE_NAME
docker stop $TEMP_CONTAINER_NAME
docker rm --force $TEMP_CONTAINER_NAME

elhigu commented 5 years ago

I really need to get this repo to gitlab and add CI runner for it to make sure it keeps working.... or maybe I can just emulate it here with travis too...