Open shexuel opened 1 year ago
if you do: gluster volume set archive performance.client-io-threads off Everything is working after that normally. This command or any other just resets the volume for some time, and again the issue is present after a day.
Updated to version 10.4, everything is the same. I noticed when a large number of files are read simultaneously bricks break from the volume and doing reconnect.
I have done a test with with these setting and everything is same: config.client-threads 48 config.brick-threads 48 transport.listen-backlog 4096 server.outstanding-rpc-limit 512 network.inode-lru-limit 30000 client.event-threads 32 server.event-threads 32 performance.io-thread-count 64
I created a monitoring bash script that checks the directory listing if slow and above one second, then reset it with just gluster volume set archive parallel-readdir on sleep 2 gluster volume set archive parallel-readdir off
Everything is working normally, but not without these commands. The problem is not solved, just sanitized.
Did you manage to resolve this? I am having the same issue.
as I said just do,
gluster volume set archive parallel-readdir on sleep 2 gluster volume set archive parallel-readdir off
it will work
This issue has not been resolved as I can reproduce it in nearly every version of GlusterFS going as far back as 7.x regardless of volume configurations, or tunning options. Even throwing high-end hardware at the problem does not resolve it (i.e. I built an all nvme flash cluster with a high single threaded IPC processor and listing directories with lots of files is still incredibly slow). Additionally, I have tried several other FUSE based distributed file systems such as MooseFS, BeeGFS, and SeaweedFS running on the same hardware without any major issues that persist with GlusterFS. I guess by the time this issue gets resolved Red Hat will have eol'ed it. This file system is a terrible choice if you require any form of end user interaction (FUSE, VFS, or re-exports as SMB/NFS.) with it but is more than capable of dealing with backend type of workloads where user interaction is far less of a concern. Lately, I have been migrating all of my customers away from it do to this persistent issue. The source of the problem isn't necessarily a GlusterFS issue but its implementation of the readdir/readdirp (getdents/getdents64) system calls. This have been historically problematic due to the implementation using a very small buffer size (1024 entries) in order to conserve memory, which does not scale well with directories containing lots of files spanning across multiple systems (trashes performance and adds insane amount of latency). The other file systems do not experience this issue because they use dedicated metadata storage devices instead of the p2p distributed nature of GlusterFS. This has its own host of problems such as SPOF and performance bottlenecks.
Thanks, this is terrible. What FS did you use at the end? I have tens or hundreds of millions of files.
Thanks, this is terrible. What FS did you use at the end? I have tens or hundreds of millions of files.
Yes, but determining what file system will work for you should be requirements based. However, I have used several OSS and Proprietary ones and determined both Ceph (Open Source) and Dell's OneFS (proprietary and expensive as hell though) offer the best performance, features, protocol support, and security for the enterprise. Ceph requires a considerably higher learning curve to setup and configure due to understanding and taking the dedicated roles required into consideration during the initial setup. If you have a few dollars to spend I believe MooseFS Pro is hands down just as easy to setup as GlusterFS and has a much better management interface called CGI Server. My reasoning for going with pro is if you need to use erasure codes, but if a distributed, replicated, or distributed- replicated volume type is what you currently use, then the non pro version will work perfectly fine using storage goals and classes. One last thing, the file systems I list above do not experience the same directory listing issues or better yet the really obnoxious volume performance degradation when adding files/folders. Yes, we discovered that volume performance straight tanks after adding new files or folders over a given period of time regardless of the lru or other volume settings.
Thanks, this is terrible. What FS did you use at the end? I have tens or hundreds of millions of files.
Yes, but determining what file system will work for you should be requirements based. However, I have used several OSS and Proprietary ones and determined both Ceph (Open Source) and Dell's OneFS (proprietary and expensive as hell though) offer the best performance, features, protocol support, and security for the enterprise. Ceph requires a considerably higher learning curve to setup and configure due to understanding and taking the dedicated roles required into consideration during the initial setup. If you have a few dollars to spend I believe MooseFS Pro is hands down just as easy to setup as GlusterFS and has a much better management interface called CGI Server. My reasoning for going with pro is if you need to use erasure codes, but if a distributed, replicated, or distributed- replicated volume type is what you currently use, then the non pro version will work perfectly fine using storage goals and classes. One last thing, the file systems I list above do not experience the same directory listing issues or better yet the really obnoxious volume performance degradation when adding files/folders. Yes, we discovered that volume performance straight tanks after adding new files or folders over a given period of time regardless of the lru or other volume settings.
Would MooseFS free be able to be setup alongside gluster? As with my gluster setup I’m currently just using distributed ontop of ZFS. So would I be able to setup moosefs on the same brick and test them side by side (well atleast test moosefs for reads) as to not mess with glusterfs write stuff
Thanks, this is terrible. What FS did you use at the end? I have tens or hundreds of millions of files.
Yes, but determining what file system will work for you should be requirements based. However, I have used several OSS and Proprietary ones and determined both Ceph (Open Source) and Dell's OneFS (proprietary and expensive as hell though) offer the best performance, features, protocol support, and security for the enterprise. Ceph requires a considerably higher learning curve to setup and configure due to understanding and taking the dedicated roles required into consideration during the initial setup. If you have a few dollars to spend I believe MooseFS Pro is hands down just as easy to setup as GlusterFS and has a much better management interface called CGI Server. My reasoning for going with pro is if you need to use erasure codes, but if a distributed, replicated, or distributed- replicated volume type is what you currently use, then the non pro version will work perfectly fine using storage goals and classes. One last thing, the file systems I list above do not experience the same directory listing issues or better yet the really obnoxious volume performance degradation when adding files/folders. Yes, we discovered that volume performance straight tanks after adding new files or folders over a given period of time regardless of the lru or other volume settings.
Would MooseFS free be able to be setup alongside gluster? As with my gluster setup I’m currently just using distributed ontop of ZFS. So would I be able to setup moosefs on the same brick and test them side by side (well atleast test moosefs for reads) as to not mess with glusterfs write stuff
Yes, you can run MooseFS in parallel with gluster it's just a matter of creating another directory that's external to your GlusterFS bricks on the same storage device.
Reference this issue as the possible source of slow directory operations:
Reference this issue as the possible source of slow directory operations:
4335
You can confirm this, along with a kernel fuse implementation issue, which I've got a private patch for which also needs a v3. Just don't have the correct link at hand, but basically the fuse read buffer for readdir needs to be made much, much larger, as in orders of magnitude larger (256KB vs 4KB).
Description of problem: Gluster volume is slow on the directory listing, error messages during slow listing:
[2023-07-05 11:15:09.570688 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:09.570725 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:09.570732 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:09.570741 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:09.570748 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:09.570772 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 The message "I [MSGID: 108031] [afr-common.c:3203:afr_local_discovery_cbk] 42-archive-replicate-0: selecting local read_child archive-client-0" repeated 148 times between [2023-07-05 11:13:19.417348 +0000] and [2023-07-05 11:15:18.030077 +0000] [2023-07-05 11:15:19.044807 +0000] I [MSGID: 108031] [afr-common.c:3203:afr_local_discovery_cbk] 42-archive-replicate-0: selecting local read_child archive-client-0 [2023-07-05 11:15:20.462206 +0000] W [fuse-bridge.c:310:check_and_dump_fuse_W] 0-glusterfs-fuse: writing to fuse device yielded ENOENT 256 times [2023-07-05 11:15:42.781829 +0000] W [fuse-bridge.c:310:check_and_dump_fuse_W] 0-glusterfs-fuse: writing to fuse device yielded ENOENT 256 times [2023-07-05 11:15:50.475283 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:50.475313 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:50.475320 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:50.475325 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:50.629862 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:58.359047 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:15:58.359099 +0000] I [fuse-bridge.c:4992:notify_kernel_loop] 0-glusterfs-fuse: len: 71, rv: -1, errno: 20 [2023-07-05 11:16:38.511419 +0000] W [fuse-bridge.c:310:check_and_dump_fuse_W] 0-glusterfs-fuse: writing to fuse device yielded ENOENT 256 times
The exact command to reproduce the issue: ls -la /archive
The full output of the command that failed:
- The operating system / glusterfs version: Centos Stream 8, Gluster 10.3 Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration