Open thalin opened 1 year ago
Hey folks, I was able to reproduce this behavior with a fresh set of Ubuntu 22.10 VMs and Gluster 3.10. I've made a reproduction case using Ansible and created a repo for it. I also included some of the stuff I had excluded for brevity in the original bug report (logs, strace, etc). You can find this repo here. Hopefully this helps to debug the issue!
I tested this on Gluster 11.0 (via an upgrade from 10.1 - I did not create a cluster from scratch using 11 and test that yet) from the Gluster team PPA and it doesn't seem to make a big difference, despite some Gluster release notes claiming improvements to maybe-related code. It's still as slow as ever, as far as I can tell.
Hello folks, I just wanted to try to follow up here. We're still having this problem. I'd be happy to provide any further information requested. Thanks!
Confirming the problem. Just started to test different distributed storage. Listing takes a lot of time. Also gluster volume status`` echo
Error : Request timed out```.
Thought glusterfs a promising one.
[2023-04-28 08:21:44.094407 +0000] I [MSGID: 106499] [glusterd-handler.c:4458:__glusterd_handle_status_volume] 0-management: Received status volume req for volume www1
[2023-04-28 08:22:02.704835 +0000] W [glusterd-locks.c:577:glusterd_mgmt_v3_lock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0x9e952) [0x7f5b32fce952] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0xd48f2) [0x7f5b330048f2] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0xd4352) [0x7f5b33004352] ) 0-management: Lock for www1 held by 7885d813-0968-4e8d-8f1d-12a768288f62
[2023-04-28 08:22:02.704922 +0000] E [MSGID: 106118] [glusterd-syncop.c:1923:gd_sync_task_begin] 0-management: Unable to acquire lock for www1
[2023-04-28 08:22:18.694418 +0000] W [glusterd-locks.c:577:glusterd_mgmt_v3_lock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0x9e952) [0x7f5b32fce952] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0xd48f2) [0x7f5b330048f2] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0xd4352) [0x7f5b33004352] ) 0-management: Lock for www1 held by 7885d813-0968-4e8d-8f1d-12a768288f62
[2023-04-28 08:22:58.732966 +0000] W [glusterd-locks.c:577:glusterd_mgmt_v3_lock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0x9e952) [0x7f5b32fce952] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0xd48f2) [0x7f5b330048f2] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/mgmt/glusterd.so(+0xd4352) [0x7f5b33004352] ) 0-management: Lock for www1 held by 7885d813-0968-4e8d-8f1d-12a768288f62
The message "E [MSGID: 106118] [glusterd-syncop.c:1923:gd_sync_task_begin] 0-management: Unable to acquire lock for www1" repeated 2 times between [2023-04-28 08:22:02.704922 +0000] and [2023-04-28 08:22:58.733063 +0000]
performance.client-io-threads: on
Can you please share the below data?
1) Enable profile
gluster v profile
Please share the current gluster release on which you are executing a test case.
My problem was resolved. It was because of MTU 9000 set on interfaces on servers, while interconnecting switches does not had jumbo frame activated.
I can confirm this problem has not been resolved. strace -Tc ls -hal reveals an excess number of getdents among other directory and stat related syscalls on any fuse mounts associated with GlusterFS
For additional context, I also have MooseFS running in parallel with Gluster on the same exact hardware configuration and the slow directory issues do not exist. Both file systems use FUSE, so most likely not a FUSE issue but something internal to GlusterFS. The problem isn't a small file problem but a many file issue.
Sadly, my servers reside on an air gapped environment and I cannot provide logs, configuration data or console output. However, this issue continues to exist in every version of GlusterFS (7.x - 11.x) regardless of the volume type or configuration options set. Based on the longevity of this issue, I find it highly unlikely it will ever get resolved before Glusters eol.
Can you please share the below data?
- Enable profile gluster v profile start
- Clean the profile gluster v profile info clear
- Run the test case
- Run the profile again gluster v profile info
Please share the current gluster release on which you are executing a test case.
Can you please share the below data?
Enable profile gluster v profile start Clean the profile gluster v profile info clear Run the test case Run the profile again gluster v profile info and share gluster v info output also Please share the current gluster release on which you are executing a test case.
Can you please share the below data?
- Enable profile gluster v profile start
- Clean the profile gluster v profile info clear
- Run the test case
- Run the profile again gluster v profile info
Please share the current gluster release on which you are executing a test case.
Can you please share the below data?
Enable profile gluster v profile start Clean the profile gluster v profile info clear Run the test case Run the profile again gluster v profile info and share gluster v info output also Please share the current gluster release on which you are executing a test case.
Sure thing I will start profiling one of our test volumes today. Unfortunately, I work in an air-gapped environment so providing any dumps is going to be tricky. I will attempt to transpose the data listed in your template, so please bear with me.
My environment is running RHEL 8.9 and GlusterFS 10.3 with a combination of dispersed and distributed-dispersed volumes in a 8 x (2 + 1) configuration.
I think this issue tracks back to the limited buffer size associated with readdir/readdirp syscalls creating a bottleneck. Actual volume performance is relatively good based on the hardware, but FUSE seems to only pull stat/metadata in groups of 1K files at a time.
Can you please share the below data?
- Enable profile gluster v profile start
- Clean the profile gluster v profile info clear
- Run the test case
- Run the profile again gluster v profile info
Please share the current gluster release on which you are executing a test case.
Can you please share the below data? Enable profile gluster v profile start Clean the profile gluster v profile info clear Run the test case Run the profile again gluster v profile info and share gluster v info output also Please share the current gluster release on which you are executing a test case.
Sure thing I will start profile one of test volumes today. Unfortunately, I work in an air-gapped environment so providing any dumps is going to be tricky. I will attempt to transpose the data listed in your template, so please bear with me.
My environment is running RHEL 8.9 and GlusterFS 10.3 with a combination of dispersed and distributed-dispersed volumes in a 8 x (2 + 1) configuration.
I think this issue tracks back to the limited buffer size associated with readdir/readdirp syscalls creating a bottleneck. Actual volume performance is relatively good based on the hardware, but FUSE seems to only pull stat/metadata in groups of 1K files at a time.
After enabling readdir-ahead there should not be an issue with buffe size for readdir(p) systecall because readdir-ahead increase the buffer size to 128k and saved the dentries in their cache.
Just FYI The readdir-ahead works only for readdirp not for readdir.
Can you please share the below data?
- Enable profile gluster v profile start
- Clean the profile gluster v profile info clear
- Run the test case
- Run the profile again gluster v profile info
Please share the current gluster release on which you are executing a test case.
Can you please share the below data? Enable profile gluster v profile start Clean the profile gluster v profile info clear Run the test case Run the profile again gluster v profile info and share gluster v info output also Please share the current gluster release on which you are executing a test case.
Sure thing I will start profile one of test volumes today. Unfortunately, I work in an air-gapped environment so providing any dumps is going to be tricky. I will attempt to transpose the data listed in your template, so please bear with me. My environment is running RHEL 8.9 and GlusterFS 10.3 with a combination of dispersed and distributed-dispersed volumes in a 8 x (2 + 1) configuration. I think this issue tracks back to the limited buffer size associated with readdir/readdirp syscalls creating a bottleneck. Actual volume performance is relatively good based on the hardware, but FUSE seems to only pull stat/metadata in groups of 1K files at a time.
After enabling readdir-ahead there should not be an issue with buffe size for readdir(p) systecall because readdir-ahead increase the buffer size to 128k and saved the dentries in their cache.
Gotcha, thanks for the quick reply. I will verify read-ahead is enabled and working correctly.
- [ ]
Can you please share the below data?
- Enable profile gluster v profile start
- Clean the profile gluster v profile info clear
- Run the test case
- Run the profile again gluster v profile info
Please share the current gluster release on which you are executing a test case.
Can you please share the below data? Enable profile gluster v profile start Clean the profile gluster v profile info clear Run the test case Run the profile again gluster v profile info and share gluster v info output also Please share the current gluster release on which you are executing a test case.
Sure thing I will start profile one of test volumes today. Unfortunately, I work in an air-gapped environment so providing any dumps is going to be tricky. I will attempt to transpose the data listed in your template, so please bear with me. My environment is running RHEL 8.9 and GlusterFS 10.3 with a combination of dispersed and distributed-dispersed volumes in a 8 x (2 + 1) configuration. I think this issue tracks back to the limited buffer size associated with readdir/readdirp syscalls creating a bottleneck. Actual volume performance is relatively good based on the hardware, but FUSE seems to only pull stat/metadata in groups of 1K files at a time.
After enabling readdir-ahead there should not be an issue with buffe size for readdir(p) systecall because readdir-ahead increase the buffer size to 128k and saved the dentries in their cache.
Gotcha, thanks for the quick reply. I will verify read-ahead is enabled and working correctly.
It's readdir-ahead not read-ahead, just FYI.
Description of problem: We have a directory with ~4300 files which takes about 15 seconds to run an
ll
against:This seems excessive. Each of these bricks is an SSD, so drive access should be relatively fast. Listing each brick (distributed-dispersed) is much faster; on the order of 0.05s with most of the same number of directories:
I've seen this take between 0.01 and 0.055s on a brick, so even with a sequential read of every brick, this shouldn't take more than 2.25 seconds (and iiuc there shouldn't be a sequential read, it should be in parallel). Here's a sequential
ll
against the offending directory on all of the bricks on one of the three machines:Even cached, the
ll
on a GlusterFS mount still takes ~7 seconds (when I run the command a couple of times in a row quickly).I looked at some previous Gluster issues which seemed similar, such as this one which mentions having to do lots of
getdents
, so I went ahead and did an strace against als
on this problem directory to see if there is any similar behavior (even though that bug is for gluster 3.x and we're on 10.1 on both client and server).In my case I see a lot of
getdents64
- 2241 of them in the strace I captured. So, most of these are returning only 1 entry! This seems crazy (strace provided on request as noted below).The exact command to reproduce the issue:
ls
The full output of the command that failed: it didn't fail, it's just slow
Expected results: I would expect that it wouldn't be this slow.
Mandatory info: - The output of the
gluster volume info
command:- The output of the
gluster volume status
command:- The output of the
gluster volume heal
command:Error : Request timed out
- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/
There are a lot of these and they're pretty large. Would a sample be OK?
- Is there any crash ? Provide the backtrace and coredump I have a large strace - it's 216KB which seems like it shouldn't be provided in a bug report.
Additional info: Some names/paths have been changed to protect the innocent
- The operating system / glusterfs version: Both client and server are running: Ubuntu 22.04.1 LTS Gluster version 10.1