gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.74k stars 1.08k forks source link

Gluster 3.8.8 + pNFS Ganesha Instability #133

Closed chjohnst closed 7 years ago

chjohnst commented 7 years ago

I posted a ticket on the ganesha github portal where I have a 6 node distributed replica (sharded) in a 3x2 configuration. I have tried ganesha 2.3.3, 2.4.1 and 2.4.3 with little luck. Essentially my clients when they are doing heavily threaded reads the client will hang and eventually stack traces come out from the D state procs. Anyone seen these traces before like this? My client OS is CentOS 7.3 latest and greatest kernels.

https://github.com/nfs-ganesha/nfs-ganesha/issues/148

Mar 1 12:55:55 dev-gc01-vm507 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 1 12:55:55 dev-gc01-vm507 kernel: qbuckets2 D ffffffffa0476de8 0 2950 2920 0x00000080 Mar 1 12:55:55 dev-gc01-vm507 kernel: ffff8800b9697600 0000000000000086 ffff880fdd9e3ec0 ffff8800b9697fd8 Mar 1 12:55:55 dev-gc01-vm507 kernel: ffff8800b9697fd8 ffff8800b9697fd8 ffff880fdd9e3ec0 ffffffffa0476de0 Mar 1 12:55:55 dev-gc01-vm507 kernel: ffffffffa0476de4 ffff880fdd9e3ec0 00000000ffffffff ffffffffa0476de8 Mar 1 12:55:55 dev-gc01-vm507 kernel: Call Trace: Mar 1 12:55:55 dev-gc01-vm507 kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] mutex_lock_slowpath+0xc5/0x1c0 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? _raw_spin_unlock_bh+0x1b/0x40 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] mutex_lock+0x1f/0x2f Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs4_discover_server_trunking+0x48/0x2e0 [nfsv4] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs4_init_client+0x124/0x2f0 [nfsv4] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? kmem_cache_alloc+0x193/0x1e0 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? fscache_acquire_cookie+0x66/0x180 [fscache] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? fscache_acquire_cookie+0x66/0x180 [fscache] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? rpc_init_wait_queue+0x13/0x20 [sunrpc] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? nfs4_alloc_client+0x199/0x1f0 [nfsv4] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs_get_client+0x22a/0x390 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs4_set_ds_client+0xfa/0x130 [nfsv4] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? nfs_readhdr_alloc+0x1a/0x20 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs4_pnfs_ds_connect+0x1d8/0x410 [nfsv4] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs4_fl_prepare_ds+0xa4/0xc8 [nfs_layout_nfsv41_files] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] filelayout_read_pagelist+0x56/0x1a0 [nfs_layout_nfsv41_files] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] pnfs_generic_pg_readpages+0xa4/0x1d0 [nfsv4] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs_pageio_doio+0x27/0x60 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs_pageio_add_request+0xb7/0x450 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? nfs_get_lock_context+0x4f/0x120 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs_pageio_add_request+0xc2/0x2a0 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] readpage_async_filler+0xeb/0x1b0 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] read_cache_pages+0x9d/0xe0 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? nfs_return_empty_page+0x70/0x70 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] nfs_readpages+0x155/0x1f0 [nfs] Mar 1 12:55:55 dev-gc01-vm507 kernel: [] __do_page_cache_readahead+0x1cc/0x250 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ra_submit+0x21/0x30 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] filemap_fault+0x105/0x410 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] do_fault+0x4c/0xc0 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] ? alloc_pages_current+0xaa/0x170 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] do_read_fault.isra.42+0x43/0x130 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] handle_mm_fault+0x6b1/0xfe0 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] __do_page_fault+0x154/0x450 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] do_page_fault+0x35/0x90 Mar 1 12:55:55 dev-gc01-vm507 kernel: [] page_fault+0x28/0x30

thotz commented 7 years ago

IMO we can close this issue because we had discussed it over nfs-ganesha project https://github.com/nfs-ganesha/nfs-ganesha/issues/148. He didn't give reply after my last comment on March 23rd. Hence closing this issue as works for me