bsdpot / pot

pot: another container framework for FreeBSD, based on jails, ZFS and pf
BSD 3-Clause "New" or "Revised" License
341 stars 31 forks source link

[BUG] Mounting ZFS is broken #261

Closed zilti closed 1 year ago

zilti commented 1 year ago

On first attempt to mount, the following happens:

root@server:/usr/home/freebsd # pot mount-in -p nextcloud -m /usr/local/www/nextcloud/config -z nextcloud_store/config
mount_nullfs: /opt/pot/jails/nextcloud/config: No such file or directory
###>  Error mounting /opt/pot/jails/nextcloud/config on nextcloud

At that point, the mount point directory gets created if it does not exist yet, and an entry is added to the jail's fscomp.conf. If it actually does mount anything at all I don't know. Subsequent attempts result in the following:

root@server:/usr/home/freebsd # pot mount-in -p nextcloud -m /usr/local/www/nextcloud/config -z nextcloud_store/config
###>  The mountpoint is not valid!

Even if the mountpoint is either empty or doesn't exist, and nothing happens. pot info -p nextcloud -v claims the directories are mounted, but that clearly is not the case, as after destroying and recreating the jail, the mounts are empty again.

zilti commented 1 year ago

Seems like there is a lot more broken currently in the release available via pkg. For each pot, the first attempt to create it with a specific name, it ends with a broken mess I have to manually delete after unmounting the mounts specified, because pot destroy will fail and pot insists that whatever I want to delete is not a pot; subsequently, I have to pot create it again.

grembo commented 1 year ago

Could you please share all relevant configurations and how to reproduce? (That is, at least pot.conf and /opt/pot/jails/$name/*.conf. zfs list zroot/pot (or similar) would also be helpful)

zilti commented 1 year ago

Sure. pot.conf:

# Ansible managed
# pot configuration file

# All datasets related to pot use the some zfs dataset as parent
# With this variable, you can choose which dataset has to be used
POT_ZFS_ROOT=tank/pot

# It is also important to know where the root dataset is mounted
POT_FS_ROOT=/opt/pot

# This is the cache used to import/export pots
POT_CACHE=/var/cache/pot

# This is where pot is going to store temporary files
POT_TMP=/tmp

# This is the suffix added to temporary files created using mktemp,
# X is a placeholder for a random character, see mktemp(1)
POT_MKTEMP_SUFFIX=.XXXXXXXX

# Define the max length of the hostname inside the pot
POT_HOSTNAME_MAX_LENGTH=64

# Internal Virtual Network configuration

# IPv4 Internal Virtual network
POT_NETWORK=10.192.0.0/10

# Internal Virtual Network netmask
POT_NETMASK=255.192.0.0

# The default gateway of the Internal Virtual Network
POT_GATEWAY=10.192.0.1

# The name of the network physical interface, to be used as default gateway
POT_EXTIF=vtnet0

# POT_EXTRA_EXTIF=expl0
# POT_NETWORK_expl0=

# DNS on the Internal Virtual Network

# name of the pot running the DNS
POT_DNS_NAME=

# IP of the DNS
POT_DNS_IP=

# VPN support

# name of the tunnel network interface
POT_VPN_EXTIF=

# POT_VPN_NETWORKS=

# EOF

zfs list tank/pot:

NAME       USED  AVAIL     REFER  MOUNTPOINT
tank/pot   496K  7.38G       96K  /opt/pot

There are no conf files of the pot itself. The only folder inside /opt/pot/jails/postgresql is m with the jail's root file system, and the zfs mount there is mounted correctly. My other pot I tried to set up is /opt/pot/jails/nextcloud; that one doesn't even contain a m directory, but instead the three directories to be mounted inside the jail (apps, config, and data).

grembo commented 1 year ago

How did you create the nextcloud pot?

zilti commented 1 year ago

That is by far the weirdest part. I create it via Ansible, but the first attempt - the one that ends up creating a completely broken pot with the zfs mounts - goes completely unlogged. I activated logging of everything, but neither my script's nor pot's logging appears in all.log.

After unmounting all the mounts and deleting /opt/pot/jails/nextcloud however, running the same script again gives log output, and pot itself logs the following messages:

pot-create -p nextcloud -i 10.192.0.4 -b 13.1 -t single -f ansible-managed -N public-bridge -S dual
pot-start nextcloud
pot-set-status -p nextcloud -s starting
pot-vnet-start
pot-set-status -p nextcloud -s started -i p463f69408571:p663f69408571
pot-stop nextcloud
pot-set-status -p nextcloud -s stopping
pot-set-status -p nextcloud -s stopped
pot-start nextcloud
pot-set-status -p nextcloud -s starting
pot-vnet-start
pot-set-status -p nextcloud -s started -i p463f69428be7:p663f69428be7
pot-mount-in -p nextcloud -m /usr/local/www/nextcloud/config -z nextcloud_store/config
mount-in: -p nextcloud -m /usr/local/www/nextcloud/config -z nextcloud_store/config
pot-mount-in -p nextcloud -m /usr/local/www/nextcloud/data -z nextcloud_store/data
mount-in: -p nextcloud -m /usr/local/www/nextcloud/data -z nextcloud_store/data
pot-mount-in -p nextcloud -m /usr/local/www/nextcloud/apps -z nextcloud_store/apps
mount-in: -p nextcloud -m /usr/local/www/nextcloud/apps -z nextcloud_store/apps
pot-set-attribute -p nextcloud -A start-at-boot -V True

According to zfs mount, however, the nextcloud_store/* subvolumes are still not mounted.

zilti commented 1 year ago

It is really, extremely bizarre... Nothing is logging anything during that brief period where everything goes wrong. Ansible doesn't log the commands it sends. My script doesn't log its stuff to syslog. Pot doesn't log anything either. Not even the ZFS mount log messages appear in syslog, despite pot clearly setting mount points. After a reboot, the mounts are (obviously) there again.

grembo commented 1 year ago

@zilti That's pretty strange - is syslog local or over network? I remember there was (is) a problem in FreeBSD 13 (at least) where networking stops working for a few moments in case you're bridging to your uplink (this is, if you don't NAT between epair interfaces/bridge IPs and instead have your public interface is on the bridge and you're adding the first epair to that bridge). Maybe that's the issue here? See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221122

zilti commented 1 year ago

I think I figured it out. Finally. Obviously when importing the zpool from which I want to mount subvolumes, those subvolumes, alongside the main pool, get mounted to the stored mount points. And since I before used those volumes using the re-mount functionality of pot, and now changed back to using the default nullfs mounting, those old mount points were still in there and active. So upon import, the /opt/pot/jails/nextcloud directory got created, confusing pot in the process.

Dang... Sometimes it ends up being such a minor thing one doesn't think of. I pretty much spent an entire day debugging this now, also thinking that it might perhaps be my ansible-pot module. Turns out nothing except myself was wrong in the first place :) Well, maybe someone runs into the same issue in the future and hopefully finds this when searching for a solution. I hope I didn't waste too much of your time!

grembo commented 1 year ago

@zilti No worries, thanks for reporting and sharing the root cause.