csabahenk / cirros

Cirros is a minimal vm for OpenStack here forked from lp:cirros
19 stars 6 forks source link

nfs server does not respond if access is denied #9

Open csabahenk opened 10 years ago

csabahenk commented 10 years ago

@vponomaryov reported:

if access is not allowed to client, next thing appear:

[ 4182.497622] nfs: server 10.0.0.2 not responding, timed out

And it takes very long time until adding rule for access.
dpkshetty commented 10 years ago

Dumping the debug info I have collected using rpcdebug for this issue

Setup: Standard devstack+manila. Generic driver in manila.conf Using csaba's cirros image as the service VM image in manila.conf Using ubuntu_nfs_cifs.qcow2 as the devstack nova instance

From a NFS perspective: cirros service VM image - 64bit - NFS server ubuntu nova instance - 32bit - NFS client

Problem in brief: 1) During manila access-allow, we use exportfs to add a export entry for NFS server 2) Then a mount from NFS client (nova instance) works 3) During manila access-deny, we use exportfs to remove the export entry in NFS server 4) Mounting from NFS client (nova instance) hangs in this case. Expectation is that it should either timeout or error out gracefully, but doesn't doesn't happen

Debug data:

* Client side data *

Client IP: 10.0.0.2

ubuntu@ubuntu:~$ dpkg-query -W nfs knfs libnfsidmap1 libnfsidmap2 0.25-1ubuntu2 nfs-client nfs-common 1:1.2.5-3ubuntu3.1 nfs-kernel-server 1:1.2.5-3ubuntu3.1 nfs-server

ubuntu@ubuntu:~$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:df:46:7c brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 brd 10.0.0.255 scope global eth0 inet6 fe80::f816:3eff:fedf:467c/64 scope link valid_lft forever preferred_lft forever ubuntu@ubuntu:~$

ubuntu@ubuntu:~$ uname -a Linux ubuntu 3.2.0-59-virtual #90-Ubuntu SMP Tue Jan 7 23:27:45 UTC 2014 i686 athlon i386 GNU/Linux

ubuntu@ubuntu:~$ nfsstat --version nfsstat: 1.2.5

ubuntu@ubuntu:~$ lsmod| grep nfs nfsd 229991 13 nfs 372273 0 lockd 78865 2 nfsd,nfs fscache 50642 1 nfs auth_rpcgss 39597 2 nfsd,nfs nfs_acl 12771 2 nfsd,nfs sunrpc 215112 19 nfsd,nfs,lockd,auth_rpcgss,nfs_acl

ubuntu@ubuntu:~$ dmesg | grep -i nfs [ 18.891846] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 20.545747] FS-Cache: Netfs 'nfs' registered for caching [ 21.379591] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). [ 66.483909] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory [ 66.497380] NFSD: starting 90-second grace period ubuntu@ubuntu:~$

ubuntu@ubuntu:/$ sudo rpcdebug -m nfs all sudo: unable to resolve host ubuntu nfs vfs dircache lookupcache pagecache proc xdr file root callback client mount

ubuntu@ubuntu:/$ sudo dmesg --clear

ubuntu@ubuntu:/$ time sudo mount -t nfs -vv -o vers=4 10.254.0.3:/shares/share-8b2d529a-0688-4a51-9641-9ee8d781380c /mnt sudo: unable to resolve host ubuntu mount.nfs: timeout set for Tue Jul 1 12:46:54 2014 mount.nfs: trying text-based options 'vers=4,addr=10.254.0.3,clientaddr=10.0.0.2' ^C ← HUNG, hence did Ctrl-C !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! real 3m43.594s user 0m0.196s sys 0m0.188s

ubuntu@ubuntu:~$ dmesg -T [Tue Jul 1 12:44:53 2014] NFS: nfs mount opts='vers=4,addr=10.254.0.3,clientaddr=10.0.0.2' [Tue Jul 1 12:44:53 2014] NFS: parsing nfs mount option 'vers=4' [Tue Jul 1 12:44:53 2014] NFS: parsing nfs mount option 'addr=10.254.0.3' [Tue Jul 1 12:44:53 2014] NFS: parsing nfs mount option 'clientaddr=10.0.0.2' [Tue Jul 1 12:44:53 2014] NFS: MNTPATH: '/shares/share-8b2d529a-0688-4a51-9641-9ee8d781380c' [Tue Jul 1 12:44:53 2014] --> nfs4_try_mount() [Tue Jul 1 12:44:53 2014] --> nfs4_create_server() [Tue Jul 1 12:44:53 2014] --> nfs4_init_server() [Tue Jul 1 12:44:53 2014] --> nfs4_set_client() [Tue Jul 1 12:44:53 2014] --> nfs_get_client(10.254.0.3,v4) [Tue Jul 1 12:44:53 2014] NFS: get client cookie (0xc71bae00/0xc688d030) [Tue Jul 1 12:44:53 2014] NFS: Callback listener port = 58338 (af 2) [Tue Jul 1 12:44:53 2014] NFS: Callback listener port = 58222 (af 10) [Tue Jul 1 12:44:53 2014] --> nfs_get_client() = c71bae00 [new] [Tue Jul 1 12:44:53 2014] <-- nfs4_set_client() = 0 [new c71bae00] [Tue Jul 1 12:44:53 2014] <-- nfs4_init_server() = 0 [Tue Jul 1 12:44:53 2014] --> nfs4_get_rootfh() [Tue Jul 1 12:44:53 2014] encode_compound: tag= [Tue Jul 1 12:45:53 2014] encode_compound: tag= [Tue Jul 1 12:45:53 2014] encode_compound: tag= [Tue Jul 1 12:47:53 2014] nfs: server 10.254.0.3 not responding, still trying [Tue Jul 1 12:47:53 2014] encode_compound: tag= [Tue Jul 1 12:47:53 2014] encode_compound: tag=

ubuntu@ubuntu:~$ dmesg [13532.875229] NFS: nfs mount opts='vers=4,addr=10.254.0.3,clientaddr=10.0.0.2' [13532.876382] NFS: parsing nfs mount option 'vers=4' [13532.882223] NFS: parsing nfs mount option 'addr=10.254.0.3' [13532.883462] NFS: parsing nfs mount option 'clientaddr=10.0.0.2' [13532.883858] NFS: MNTPATH: '/shares/share-8b2d529a-0688-4a51-9641-9ee8d781380c' [13532.883934] --> nfs4_try_mount() [13532.888564] --> nfs4_create_server() [13532.889029] --> nfs4_init_server() [13532.889160] --> nfs4_set_client() [13532.889212] --> nfs_get_client(10.254.0.3,v4) [13532.889851] NFS: get client cookie (0xc63ad600/0xc688d030) [13532.937062] NFS: Callback listener port = 40560 (af 2) [13532.938085] NFS: Callback listener port = 44198 (af 10) [13532.939346] --> nfs_get_client() = c63ad600 [new] [13532.939468] <-- nfs4_set_client() = 0 [new c63ad600] [13532.941944] <-- nfs4_init_server() = 0 [13532.942539] --> nfs4_get_rootfh() [13532.943172] encode_compound: tag=

* server side data *

Server IP: 10.254.0.3

Since its a cirros image, it doens't have a pkg manager, so can't really query for pkg versions ! Closest I can do is below, if that helps ...

$ nfsstat --version nfsstat: 1.2.6

$ uname -a Linux cirros 3.11.0-17-generic #31-Ubuntu SMP Mon Feb 3 21:52:43 UTC 2014 x86_64 GNU/Linux $ $ lsmod| grep nfs nfsd 280346 7 auth_rpcgss 54830 1 nfsd nfsv4 253863 0 nfsv3 39326 0 nfs_acl 12837 2 nfsd,nfsv3 nfsv2 24316 0 nfs 180906 3 nfsv4,nfsv3,nfsv2 lockd 94016 4 nfsd,nfsv3,nfsv2,nfs sunrpc 271857 22 nfsd,auth_rpcgss,nfsv4,nfsv3,nfs_acl,nfsv2,nfs,lockd fscache 58860 3 cifs,nfsv4,nfs

$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether fa:16:3e:de:1c:79 brd ff:ff:ff:ff:ff:ff inet 10.254.0.3/29 brd 10.254.0.7 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fede:1c79/64 scope link valid_lft forever preferred_lft forever

$ sudo exportfs (when share access-allow is done in manila) /shares/share-8b2d529a-0688-4a51-9641-9ee8d781380c 10.0.0.2

$ sudo exportfs (when share access-deny is done in manila) $

$ dmesg | grep -i nfs [ 5.306556] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 5.359405] FS-Cache: Netfs 'nfs' registered for caching [ 5.494300] NFS: Registering the id_resolver key type [ 5.570337] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). [ 15.823181] NFSD: the nfsdcld client tracking upcall will be removed in 3.10. Please transition to using nfsdcltrack. [ 15.828906] NFSD: starting 90-second grace period (net ffffffff81cd2d00) [ 136.033358] NFSD: Unable to end grace period: -110 [10856.545789] NFSD: Unable to create client record on stable storage: -110 [10913.633651] NFSD: Unable to create client record on stable storage: -110 [10964.832655] NFSD: Unable to create client record on stable storage: -110

dpkshetty commented 10 years ago

I also have the tcpdump data captured at the time of the mount hanging. If needed I can share the tcpdump raw binary data too. It gives the same conclusion that dmesg above gives... the client sends PUTROOTFH and hangs waiting for server to respond to it!

dpkshetty commented 10 years ago

With all the setup remaining same as above, I tried with 64bit NFS client now ...

From a NFS perspective: cirros service VM image - 64bit - NFS server Fedora20 cloud image as nova instance - 64bit - NFS client

Debug data for the case where manila access-deny was successfully executed:

[root@host-10-0-0-5 ~]# uname -a Linux test-vm-fedora 3.11.10-301.fc20.x86_64 #1 SMP Thu Dec 5 14:01:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux c[root@host-10-0-0-5 ~]# cat /etc/issue Fedora release 20 (Heisenbug) Kernel \r on an \m (\l)

[root@host-10-0-0-5 ~]#

[root@host-10-0-0-5 ~]# lsmod | grep nfs nfsv4 251490 0 dns_resolver 13096 1 nfsv4 nfs 169149 1 nfsv4 fscache 61532 2 nfs,nfsv4 nfsd 279847 1 auth_rpcgss 54254 1 nfsd nfs_acl 12741 1 nfsd lockd 93436 2 nfs,nfsd sunrpc 253362 15 nfs,nfsd,auth_rpcgss,lockd,nfsv4,nfs_acl [root@host-10-0-0-5 ~]#

[root@host-10-0-0-5 ~]# rpcdebug -m nfs all nfs vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state

[root@host-10-0-0-5 ~]# dmesg --clear

[root@host-10-0-0-5 ~]# mount -t nfs -vv -o vers=4 10.254.0.3:/shares/share-2df4b2a5-1bfa-486f-9a92-a7c35422193b /mnt mount.nfs: timeout set for Fri Aug 8 12:04:39 2014 mount.nfs: trying text-based options 'vers=4,addr=10.254.0.3,clientaddr=10.0.0.5' ^C [root@host-10-0-0-5 ~]# dmesg -T [Fri Aug 8 12:02:38 2014] NFS: nfs mount opts='vers=4,addr=10.254.0.3,clientaddr=10.0.0.5' [Fri Aug 8 12:02:38 2014] NFS: parsing nfs mount option 'vers=4' [Fri Aug 8 12:02:38 2014] NFS: parsing nfs mount option 'addr=10.254.0.3' [Fri Aug 8 12:02:38 2014] NFS: parsing nfs mount option 'clientaddr=10.0.0.5' [Fri Aug 8 12:02:38 2014] NFS: MNTPATH: '/shares/share-2df4b2a5-1bfa-486f-9a92-a7c35422193b' [Fri Aug 8 12:02:38 2014] --> nfs4_try_mount() [Fri Aug 8 12:02:38 2014] --> nfs4_create_server() [Fri Aug 8 12:02:38 2014] --> nfs4_init_server() [Fri Aug 8 12:02:38 2014] --> nfs4_set_client() [Fri Aug 8 12:02:38 2014] --> nfs_get_client(10.254.0.3,v4) [Fri Aug 8 12:02:38 2014] NFS: get client cookie (0xffff8800036a9c00/0xffff880004d20050) [Fri Aug 8 12:02:38 2014] nfs_callback_create_svc: service created [Fri Aug 8 12:02:38 2014] NFS: create per-net callback data; net=ffffffff81cbdec0 [Fri Aug 8 12:02:38 2014] NFS: Callback listener port = 34727 (af 2, net ffffffff81cbdec0) [Fri Aug 8 12:02:38 2014] NFS: Callback listener port = 33849 (af 10, net ffffffff81cbdec0) [Fri Aug 8 12:02:39 2014] nfs_callback_up: service started [Fri Aug 8 12:02:39 2014] NFS: nfs4_discover_server_trunking: testing '10.254.0.3' [Fri Aug 8 12:02:39 2014] NFS call setclientid auth=UNIX, 'Linux NFSv4.0 10.0.0.5/10.254.0.3 tcp' [Fri Aug 8 12:03:32 2014] NFS reply setclientid: -512 [Fri Aug 8 12:03:32 2014] NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO [Fri Aug 8 12:03:32 2014] NFS: nfs4_discover_server_trunking: status = -5 [Fri Aug 8 12:03:32 2014] --> nfs_put_client({1}) [Fri Aug 8 12:03:32 2014] NFS: destroy per-net callback data; net=ffffffff81cbdec0 [Fri Aug 8 12:03:32 2014] nfs_callback_down: service stopped [Fri Aug 8 12:03:32 2014] nfs_callback_down: service destroyed [Fri Aug 8 12:03:32 2014] --> nfs_free_client(4) [Fri Aug 8 12:03:32 2014] NFS: releasing client cookie (0xffff8800036a9c00/0xffff880004d20050) [Fri Aug 8 12:03:32 2014] <-- nfs_free_client() [Fri Aug 8 12:03:32 2014] <-- nfs4_init_client() = xerror -5 [Fri Aug 8 12:03:32 2014] <-- nfs4_set_client() = xerror -5 [Fri Aug 8 12:03:32 2014] <-- nfs4_init_server() = -5 [Fri Aug 8 12:03:32 2014] --> nfs_free_server() [Fri Aug 8 12:03:32 2014] <-- nfs_free_server() [Fri Aug 8 12:03:32 2014] <-- nfs4_create_server() = error -5 [Fri Aug 8 12:03:32 2014] <-- nfs4_try_mount() = -5 [error]

bswartz commented 10 years ago

I'm wondering if this issue is something specific to do with NFSv4 and/or not having any shares at all. If I had 2 shares and I denied access to one of them, would I experience a timeout or an access denied when trying to access the it? It seems to me that having no shares at all might put the NFS server into a special state where it stops listening.

bswartz commented 10 years ago

If the above is true then the workaround would be to always share at least one directory -- presumably some empty, read-only directory.

dpkshetty commented 10 years ago

Valeiry had suggested similar and IIRC he himself later posted on IRC after testing with cirros image that have 1 share always present doesn't solve the problem. Per Valeriy, I tested in manila devstack context. As part of create-share I added a export for the share-server serving the same share to its own IP and access-allow adds another entry for that share to the instance IP, then when we do deny-access the instance IP entry is removed from exportfs (share server IP entry remains) and the problem (NFS client hang) did reproduce in this case too. So having 1 entry always doesn't seem to solve the issue.