Open phlax opened 4 years ago
im thinking this issue is not related to the grace period, and may well not be a problem. Not sure how to test the client-tracking
same problem
[ 241.983122] NFSD: Unable to initialize client recovery tracking! (-22)
[ 241.983125] NFSD: starting 90-second grace period (net f0001296)
[ 274.339363] NFS: Registering the id_resolver key type
[ 274.339370] Key type id_resolver registered
[ 274.339371] Key type id_legacy registered
[ 274.401357] RPC: server localhost requires stronger authentication.
[ 274.401381] RPC: server localhost requires stronger authentication.
[ 274.401406] RPC: server localhost requires stronger authentication.
[ 274.401433] RPC: server localhost requires stronger authentication.
[ 274.401435] svc: failed to register lockdv1 RPC service (errno 13).
[ 274.401436] lockd_up: makesock failed, error=-13
[ 274.401460] RPC: server localhost requires stronger authentication.
[ 274.401484] RPC: server localhost requires stronger authentication.
[ 274.401507] RPC: server localhost requires stronger authentication.
[ 484.420888] INFO: task mount.nfs:32840 blocked for more than 120 seconds.
[ 484.420899] Tainted: G OE 5.4.0-42-generic #46-Ubuntu
[ 484.420903] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.420908] mount.nfs D 0 32840 1 0x00004004
[ 484.420913] Call Trace:
[ 484.420925] __schedule+0x2e3/0x740
[ 484.420931] ? __call_rcu+0x1d0/0x1d0
[ 484.420934] schedule+0x42/0xb0
[ 484.420948] lockd_unregister_notifiers+0x66/0xa0 [lockd]
[ 484.420952] ? wait_woken+0x80/0x80
[ 484.420961] lockd_up+0x12a/0x2b0 [lockd]
[ 484.420968] nlmclnt_init+0x2c/0xc0 [lockd]
[ 484.420983] nfs_start_lockd+0xee/0x130 [nfs]
[ 484.420998] nfs_init_server+0x1df/0x350 [nfs]
[ 484.421012] nfs_create_server+0x77/0x1c0 [nfs]
[ 484.421017] nfs3_create_server+0x10/0x30 [nfsv3]
[ 484.421037] nfs_try_mount+0x1c5/0x2d0 [nfs]
[ 484.421043] ? __kmalloc_track_caller+0x180/0x270
[ 484.421061] nfs_fs_mount+0x285/0x740 [nfs]
[ 484.421066] ? __lookup_constant+0x4d/0x70
[ 484.421083] ? nfs_clone_super+0xe0/0xe0 [nfs]
[ 484.421098] ? nfs_parse_mount_options+0xd90/0xd90 [nfs]
[ 484.421103] legacy_get_tree+0x2b/0x50
[ 484.421107] vfs_get_tree+0x2a/0xc0
[ 484.421112] ? capable+0x19/0x20
[ 484.421116] do_mount+0x7b6/0x9c0
[ 484.421121] ksys_mount+0x82/0xd0
[ 484.421125] __x64_sys_mount+0x25/0x30
[ 484.421130] do_syscall_64+0x57/0x190
[ 484.421134] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 484.421139] RIP: 0033:0x7f61b755ac7e
[ 484.421148] Code: Bad RIP value.
[ 484.421151] RSP: 002b:00007fff6681cb28 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 484.421154] RAX: ffffffffffffffda RBX: 00007fff6681cd30 RCX: 00007f61b755ac7e
[ 484.421155] RDX: 00005629cda8a880 RSI: 00005629cda88b40 RDI: 00005629cda88b60
[ 484.421157] RBP: 00007f61b71627b8 R08: 00005629cda8cc60 R09: 00005629cda8cc60
[ 484.421159] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 484.421160] R13: 00007fff6681cc10 R14: 00005629cda8cc20 R15: 00005629cda8ab70
[ 605.256665] INFO: task mount.nfs:32840 blocked for more than 241 seconds.
[ 605.256677] Tainted: G OE 5.4.0-42-generic #46-Ubuntu
[ 605.256680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 605.256686] mount.nfs D 0 32840 1 0x00004004
[ 605.256691] Call Trace:
[ 605.256703] __schedule+0x2e3/0x740
[ 605.256709] ? __call_rcu+0x1d0/0x1d0
[ 605.256712] schedule+0x42/0xb0
[ 605.256727] lockd_unregister_notifiers+0x66/0xa0 [lockd]
[ 605.256732] ? wait_woken+0x80/0x80
[ 605.256740] lockd_up+0x12a/0x2b0 [lockd]
[ 605.256747] nlmclnt_init+0x2c/0xc0 [lockd]
[ 605.256762] nfs_start_lockd+0xee/0x130 [nfs]
[ 605.256777] nfs_init_server+0x1df/0x350 [nfs]
[ 605.256791] nfs_create_server+0x77/0x1c0 [nfs]
[ 605.256796] nfs3_create_server+0x10/0x30 [nfsv3]
[ 605.256815] nfs_try_mount+0x1c5/0x2d0 [nfs]
[ 605.256821] ? __kmalloc_track_caller+0x180/0x270
[ 605.256840] nfs_fs_mount+0x285/0x740 [nfs]
[ 605.256846] ? __lookup_constant+0x4d/0x70
[ 605.256862] ? nfs_clone_super+0xe0/0xe0 [nfs]
[ 605.256876] ? nfs_parse_mount_options+0xd90/0xd90 [nfs]
[ 605.256881] legacy_get_tree+0x2b/0x50
[ 605.256886] vfs_get_tree+0x2a/0xc0
[ 605.256891] ? capable+0x19/0x20
[ 605.256895] do_mount+0x7b6/0x9c0
[ 605.256900] ksys_mount+0x82/0xd0
[ 605.256904] __x64_sys_mount+0x25/0x30
[ 605.256909] do_syscall_64+0x57/0x190
[ 605.256914] entry_SYSCALL_64_after_hwframe+0x44/0xa9
I see logs like
You can see these logs on travis here
Im thinking its the failure to initialize nfs tracking that is causing the grace period to come into effect.
kick #44
the logs are in the host
syslog
not in the containerfor ref there is some discussion of similar logs here - https://bugzilla.redhat.com/show_bug.cgi?id=1700098
once the grace period expires, the container comes up and nfs exports as expected