bsdpot / nomad-pot-driver

Nomad task driver for launching freebsd jails.
Apache License 2.0
84 stars 14 forks source link

prune fails because of unmounted datasets #35

Closed einsiedlerkrebs closed 1 year ago

einsiedlerkrebs commented 1 year ago

I am experiencing the issue, that the pots are not cleaned up after hard reboot of a nomad node and therefore the jobs are failing.

When the system is up again, the ZFS datasets are not mounted into position, therefore the configuration file of a pot can not be found. This leads to a failing prune command and therefore to the inability to run "prepare" in nomad. Because of this reason the node in not reboot safe.

To reproduce:

grembo commented 1 year ago

Thanks for opening this issue.

What does your system config look like, at least:

cat /etc/rc.conf
zpool status

(after reboot - redact as necessary)

einsiedlerkrebs commented 1 year ago
nomad_enable="True"
nomad_user="root"
nomad_debug="YES"
nomad_env="PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin"
nomad_dir="/var/tmp/nomad"
nomad_args="-config=/opt/hashicorp/nomad-agent.hcl"

zpool status shows both pools online.

my fix for the issue is:

#!/bin/sh
# After hard crash of a nomad node, remaining pots can't be pruned since their datasets are not mounted.
# This mounts the datasets and prunes pots.

zfs list -rH -o name zdata/pot/jails | xargs -L 1 zfs mount && logger -t pot_cleanup mounting all pot datasets || \
 logger -t pot_cleanup failed to mount all pot datasets
pot prune && logger -t pot_cleanup pruning pots || logger -t pot_cleanup could not prune all pots

This supported by a RC script which runs it once before nomad.

grembo commented 1 year ago

@einsiedlerkrebs any reason you didn’t enable zfs in rc.conf? This can, e.g., be done using the service command:

# service zfs enable

It would take care of mounting zfs file systems on boot.

einsiedlerkrebs commented 1 year ago

Yes indeed this solved the issue. Thanks.