ehough / docker-nfs-server

A lightweight, robust, flexible, and containerized NFS server.
https://hub.docker.com/r/erichough/nfs-server/
GNU General Public License v3.0
669 stars 221 forks source link

Add healthcheck #47

Open phlax opened 4 years ago

phlax commented 4 years ago

I think exportfs provides a fairly reliable way to check whether the container is healthy, so it seems to me make sense to add the healthcheck in the image - ie the healthcheck doesnt require any runtime vars

ehough commented 3 years ago

LGTM. Thank you for your contribution! I'll merge momentarily.

phlax commented 3 years ago

hi @ehough - the only thing i wasnt 100% sure about is the optimal settings for retry etc - these seemed reasonable defaults - but not sure

phlax commented 3 years ago

i guess the issue with the grace period is significant here

ehough commented 3 years ago

OK I think I spoke too soon. The problem is that exportfs's exit code seems like it's always 0. Run this:

docker run --rm alpine sh -c "apk add nfs-utils && exportfs; echo $?"

So exportfs's exit code would indicate to Docker that the container is healthy, even though there could be other problems with the server.

This image does a ton of error checking during boot up, so I think if entrypoint.sh is alive after, say 30 seconds, then we can somewhat safely assume that the container is healthy. Perhaps in the future we could add more sophisticated checks (i.e. ID mapping is requested, ensure that idmapd is running).

Would you be interested in updating this PR to instead check for entrypoint.sh? Something like pidof /usr/local/bin/entrypoint.sh might be enough. I'm happy to make the change but I don't want to clobber your contribution!

i guess the issue with the grace period is significant here

I haven't forgotten about #44. I was testing it this weekend and I kept crashing my laptop (it's running El Capitan, which has major issues with NFSv4). Stand by for updates there.

phlax commented 3 years ago

The problem is that exportfs's exit code seems like it's always 0

wierd, i tested this when i made the PR and have been using it since. Not sure what i must have tested 8/ .

Would you be interested in updating this PR to instead check for entrypoint.sh?

sure

phlax commented 3 years ago

after a little playing with this, i have come to the conclusion that this healthcheck should probs not be added to the base image, for a few reasons

i had a quick look at the checks in the entrypoint and my impression is that these are more reliable, but not sure how to indicate from script to env the health of exported dirs

also worth mentioning - https://www.ibm.com/support/pages/understanding-nfs-health-check-datapower - although it seems only tangentially helpful

phlax commented 3 years ago

also - i might be wrong - but i think this does work after all (it seems to in my deployment)

try this:

docker run --rm alpine sh -c "apk add nfs-utils && echo $(exportfs); echo \"Received: $?\""

compared to:

docker run --rm alpine sh -c "apk add nfs-utils && echo $(exit 0); echo \"Received: $?\""

this was wrong - further testing shows it was expanding the $() in the host

phlax commented 3 years ago

although not sure 8/

$ docker run -ti --rm alpine sh
/ # apk add nfs-utils && exportfs
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
(1/15) Installing krb5-conf (1.0-r1)
(2/15) Installing libcom_err (1.45.5-r0)
(3/15) Installing keyutils-libs (1.6.1-r0)
(4/15) Installing libverto (0.3.1-r1)
(5/15) Installing krb5-libs (1.17.1-r0)
(6/15) Installing libtirpc (1.1.4-r0)
(7/15) Installing rpcbind (1.2.5-r0)
Executing rpcbind-1.2.5-r0.pre-install
(8/15) Installing libblkid (2.34-r1)
(9/15) Installing libcap (2.27-r0)
(10/15) Installing device-mapper-libs (2.02.186-r0)
(11/15) Installing libevent (2.1.11-r0)
(12/15) Installing libmount (2.34-r1)
(13/15) Installing libnfsidmap (2.4.1-r2)
(14/15) Installing sqlite-libs (3.30.1-r2)
(15/15) Installing nfs-utils (2.4.1-r2)
Executing busybox-1.31.1-r9.trigger
OK: 12 MiB in 29 packages
/ # echo $?
0
phlax commented 3 years ago

this seems to work, but not sure about performance or how reliable

/ # nfsstat -s
Error: No Server Stats (/proc/net/rpc/nfsd: No such file or directory). 
/ # echo "$?"
2

edit: i dont seem to be able to get this to work even with successfully exported dirs - i think because docker and /proc - either way it doesnt seem an option

phlax commented 3 years ago

@ehough shall i close this?

i think its probs best to leave to individual setups etc