DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Newly created GitLab instance fails to mount data volume #4890

Open hannes-ucsc opened 1 year ago

hannes-ucsc commented 1 year ago

After a regularly scheduled GitLab version upgrade, the instance came back without mounting the data volume at /mnt/gitlab. The GitLab application the started with vanilla configuration and data, rendering it essentially dysfunctional.

It seems that when cloud-init attempted to create the /etc/fstab entry for the volume, the corresponding device node in /dev hadn't yet been created by the kernel. When I later logged in, the device node did exist. So this sounds like a race in cloud-init, the kernel or EC2. Rebooting the instance didn't help because I think that part of cloud-init is once-per-instance, not once-per-boot.

sudo grep -rF nvme1n1 /var/log/*
cloud-init.log:Jan 11 01:39:55 cloud-init[2588]: cc_mounts.py[DEBUG]: mounts configuration is [['/dev/nvme1n1', '/mnt/gitlab', 'ext4', '']]
cloud-init.log:Jan 11 01:39:55 cloud-init[2588]: cc_mounts.py[DEBUG]: Attempting to determine the real name of /dev/nvme1n1
cloud-init.log:Jan 11 01:39:55 cloud-init[2588]: cc_mounts.py[DEBUG]: changed /dev/nvme1n1 => None
cloud-init.log:Jan 11 01:39:55 cloud-init[2588]: cc_mounts.py[DEBUG]: Ignoring nonexistent named mount /dev/nvme1n1``` 

Compare that to a normal log

cloud-init.log:Jan 11 01:00:02 cloud-init[2575]: cc_mounts.py[DEBUG]: mounts configuration is [['/dev/nvme1n1', '/mnt/gitlab', 'ext4', '']]
cloud-init.log:Jan 11 01:00:02 cloud-init[2575]: cc_mounts.py[DEBUG]: Attempting to determine the real name of /dev/nvme1n1
cloud-init.log:Jan 11 01:00:02 cloud-init[2575]: cc_mounts.py[DEBUG]: Changes to fstab: ['+ /dev/nvme1n1 /mnt/gitlab ext4 ,comment=cloudconfig 0 2']
Binary file journal/ec29563d43d0eed9d02a4518576ec5d2/system@a55daef5424946e79396632b0c772bdc-0000000000000001-0005f1f283f5dba0.journal matches
messages:Jan 11 01:00:02 ip-172-71-0-215 kernel: EXT4-fs (nvme1n1): mounted filesystem with ordered data mode. Opts: (null)

achave11-ucsc commented 1 year ago

The workaround is to recreate the instance.

hannes-ucsc commented 1 year ago

Nothing we can do here. There isn't even an upstream issue to block this on.

hannes-ucsc commented 7 months ago

https://github.com/canonical/cloud-init/issues/3386 is the upstream blocker

hannes-ucsc commented 7 months ago

Assignee to monitor upstream blocker.