ehough / docker-nfs-server

A lightweight, robust, flexible, and containerized NFS server.
https://hub.docker.com/r/erichough/nfs-server/
GNU General Public License v3.0
714 stars 229 forks source link

rpc.nfsd hangs on container start up #43

Open DeathRabbit679 opened 4 years ago

DeathRabbit679 commented 4 years ago

I'll preface this with saying this is probably a bug against me not knowing nfs things. I'm using the following compose file to spin the container.

version: "3.7"
services:
  nfs-server:
    container_name: nfs-server
    image: erichough/nfs-server:latest
    ports:
      - "2049:2049"
    volumes:
      - type: bind
        source: /mnt/nfstest
        target: /shares/nfstest

      - type: bind
        source: /home/cubeadmin/nfs/nfstest-exports-docker
        target: /etc/exports
        read_only: true
    environment:
      - NFS_DISABLE_VERSION_3=YES
      - NFS_LOG_LEVEL=DEBUG
    cap_add:
      - SYS_ADMIN
    security_opt:
      - apparmor=erichough-nfs

Exports file looks like this:

/shares/nfstest 192.168.1.0/24(fsid=0,rw,no_root_squash,no_subtree_check)

I am able to basically docker-compose restart down/up just fine until I mount the share from another system. After I do that, regardless of whether or not I unmount from the other system, the next time I try to restart the container, it hangs on starting rpc.nfsd on port 2049 with 4 server thread(s) . I'm unable to stop the container or even kill -9 the rpc.nfsd process at that point. It must be some sort of low level deadlock, but my brain has been too small to find the root cause thus far. :100:

ehough commented 4 years ago

Thanks for the report. Very interesting bug you're experiencing - let's see if we can fix it!

I see that you put the container into debug mode - do the server logs show anything interesting? When you restart the container, I'd be interested in the logs of the server shutdown mostly, but also the logs of the server that tries to start up. Feel free to post the logs - hopefully we'll see a clue.

When you issue the docker-compose restart command, does the shutdown happen normally (quickly), and it's just the new server launch that hangs?

I'm unable to stop the container or even kill -9 the rpc.nfsd process at that point.

As far as I can tell, rpc.nfsd ignores all signals and will only shut down if you run /sbin/rpc.nfsd 0. But this image should handle that for you, so really what we need to do is find out why the server is "hanging on" to your client connections.

On your clients, what is the full mount command that you're using?

DeathRabbit679 commented 4 years ago

It seems if I just do just a compose down/stop everything looks like it terminates gracefully. Here's the docker-compose logs I see by doing a docker-compose up, mounting from the client system, then doing a docker-compose restart:

nfs-server    | ==================================================================
nfs-server    |       SETTING UP ...
nfs-server    | ==================================================================
nfs-server    | ----> log level set to DEBUG
nfs-server    | ----> will use 4 rpc.nfsd server thread(s) (1 thread per CPU)
nfs-server    | ----> /etc/exports is bind-mounted
nfs-server    | ----> kernel module nfs is loaded
nfs-server    | ----> kernel module nfsd is loaded
nfs-server    | ----> setup complete
nfs-server    | 
nfs-server    | ==================================================================
nfs-server    |       STARTING SERVICES ...
nfs-server    | ==================================================================
nfs-server    | ----> mounting rpc_pipefs filesystem onto /var/lib/nfs/rpc_pipefs
nfs-server    | mount: mount('rpc_pipefs','/var/lib/nfs/rpc_pipefs','rpc_pipefs',0x00008000,'(null)'):0
nfs-server    | ----> mounting nfsd filesystem onto /proc/fs/nfsd
nfs-server    | mount: mount('nfsd','/proc/fs/nfsd','nfsd',0x00008000,'(null)'):0
nfs-server    | ----> starting rpcbind
nfs-server    | ----> starting exportfs
nfs-server    | exporting 192.168.1.0/24:/shares/nfstest
nfs-server    | ----> starting rpc.mountd on port 32767
nfs-server    | ----> starting rpc.nfsd on port 2049 with 4 server thread(s)
nfs-server    | rpc.nfsd: knfsd is currently down
nfs-server    | rpc.nfsd: Writing version string to kernel: -2 -3 +4 +4.1 +4.2
nfs-server    | rpc.nfsd: Created AF_INET TCP socket.
nfs-server    | rpc.nfsd: Created AF_INET UDP socket.
nfs-server    | rpc.nfsd: Created AF_INET6 TCP socket.
nfs-server    | rpc.nfsd: Created AF_INET6 UDP socket.
nfs-server    | ----> terminating rpcbind
nfs-server    | ----> all services started normally
nfs-server    | 
nfs-server    | ==================================================================
nfs-server    |       SERVER STARTUP COMPLETE
nfs-server    | ==================================================================
nfs-server    | ----> list of enabled NFS protocol versions: 4.2, 4.1, 4
nfs-server    | ----> list of container exports:
nfs-server    | ---->   /shares/nfstest 192.168.1.0/24(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,no_subtree_check,secure_locks,acl,no_pnfs,fsid=0,anonuid=65534,anongid=65534,sec=sys,rw,secure,no_root_squash,no_all_squash)
nfs-server    | ----> list of container ports that should be exposed: 2049 (TCP)
nfs-server    | 
nfs-server    | ==================================================================
nfs-server    |       READY AND WAITING FOR NFS CLIENT CONNECTIONS
nfs-server    | ==================================================================
nfs-server    | 
nfs-server    | ==================================================================
nfs-server    |       TERMINATING ...
nfs-server    | ==================================================================
nfs-server    | ----> terminating nfsd
nfs-server    | ----> terminating rpc.mountd
nfs-server    | ----> un-exporting filesystem(s)
nfs-server    | ----> rpcbind was not running
nfs-server    | ----> un-mounting nfsd filesystem from /proc/fs/nfsd
nfs-server    | ----> un-mounting rpc_pipefs filesystem from /var/lib/nfs/rpc_pipefs
nfs-server    | 
nfs-server    | ==================================================================
nfs-server    |       TERMINATED
nfs-server    | ==================================================================
nfs-server    | 
nfs-server    | ==================================================================
nfs-server    |       SETTING UP ...
nfs-server    | ==================================================================
nfs-server    | ----> log level set to DEBUG
nfs-server    | ----> will use 4 rpc.nfsd server thread(s) (1 thread per CPU)
nfs-server    | ----> /etc/exports is bind-mounted
nfs-server    | ----> kernel module nfs is loaded
nfs-server    | ----> kernel module nfsd is loaded
nfs-server    | ----> setup complete
nfs-server    | 
nfs-server    | ==================================================================
nfs-server    |       STARTING SERVICES ...
nfs-server    | ==================================================================
nfs-server    | ----> mounting rpc_pipefs filesystem onto /var/lib/nfs/rpc_pipefs
nfs-server    | mount: mount('rpc_pipefs','/var/lib/nfs/rpc_pipefs','rpc_pipefs',0x00008000,'(null)'):0
nfs-server    | ----> mounting nfsd filesystem onto /proc/fs/nfsd
nfs-server    | mount: mount('nfsd','/proc/fs/nfsd','nfsd',0x00008000,'(null)'):0
nfs-server    | ----> starting rpcbind
nfs-server    | ----> starting exportfs
nfs-server    | exporting 192.168.1.0/24:/shares/nfstest
nfs-server    | ----> starting rpc.mountd on port 32767
nfs-server    | ----> starting rpc.nfsd on port 2049 with 4 server thread(s)

The container entrypoint.sh seems to not progress beyond that point. The mount command I'm using on the client side looks like:

mandy@mandy-X401A1 /mnt $ sudo mount -vvv 192.168.1.112:/ /mnt/blah
mount.nfs: timeout set for Sun Apr 12 20:10:04 2020
mount.nfs: trying text-based options 'vers=4,addr=192.168.1.112,clientaddr=192.168.1.113'

I've also tried doing a umount on the client side before restarting compose, just to see if the behavior would change and it doesn't seem to prevent the hang on startup on the server side. Once the server gets in this state, the only way I've found I can start the container again is to fully reboot the server first. I'm running the latest and greatest Ubuntu bionic on the server and the client is an old laptop running Mint Sylvia if it helps narrow anything down.

gohmc commented 4 years ago

Sometime the Alpine Linux can incur unexpected behavior. Try build with latest Alpine release or replace it with Ubuntu/Debian.

franciscorode commented 3 years ago

I had the same problem with non-LTS Ubuntu distributions. I discovered it by chance because on another machine the problem didn't occur, and by trying with other machines of co-workers I confirmed it. I do not know the exact cause but upgrading to the LTS distribution solved the problem