Performance issues on backend xfs bricks

matclayton commented 4 years ago

strace.tar.gz

Description of problem: We have nginx backed by glusterfs, we've started to see socketbacklog issues on nginx and performance problems serving files, under further investigation we believe this is due to glusterfs and probably the backend bricks. Having talked to Pranith on slack I'm opening this issue. we observed several bricks taking 250mS+ to do LOOKUP and READ filesystem ops, attached is an strace of one of the backend bricks.

The backend bricks are running XFS on LVM2 on a RAID6 array using a Hardware controller + SSD cache.

The architecture is Nginx (via nginx_vod) -> glusterfs_client fuse mount -> gluster

The exact command to reproduce the issue: Observed latency from nginx

The full output of the command that failed:

**Expected results:**

**Mandatory info:** **- The output of the `gluster volume info` command**: ``` root@storage1:/home/team # gluster volume info Volume Name: storage Type: Distributed-Replicate Volume ID: c0307211-cb30-4397-80a8-61c1724fca17 Status: Started Snapshot Count: 0 Number of Bricks: 14 x (2 + 1) = 42 Transport-type: tcp Bricks: Brick1: storage1:/media/diskb/storage Brick2: storage2:/media/diskb/storage Brick3: storage3:/media/diskb2/storage (arbiter) Brick4: storage1:/media/diskc/storage Brick5: storage2:/media/diskc/storage Brick6: storage3:/media/diskc2/storage (arbiter) Brick7: storage3:/media/diskb/storage Brick8: storage4:/media/diskb/storage Brick9: storage5:/media/diskb2/storage (arbiter) Brick10: storage3:/media/diskc/storage Brick11: storage4:/media/diskc/storage Brick12: storage5:/media/diskc2/storage (arbiter) Brick13: storage5:/media/diskb/storage Brick14: storage6:/media/diskb/storage Brick15: storage7:/media/diskb2/storage (arbiter) Brick16: storage5:/media/diskc/storage Brick17: storage6:/media/diskc/storage Brick18: storage7:/media/diskc2/storage (arbiter) Brick19: storage7:/media/diskb/storage Brick20: storage8:/media/diskb/storage Brick21: storage9:/media/diskb2/storage (arbiter) Brick22: storage7:/media/diskc/storage Brick23: storage8:/media/diskc/storage Brick24: storage9:/media/diskc2/storage (arbiter) Brick25: storage9:/media/diskb/storage Brick26: storage10:/media/diskb/storage Brick27: storage1:/media/diskb2/storage (arbiter) Brick28: storage9:/media/diskc/storage Brick29: storage10:/media/diskc/storage Brick30: storage1:/media/diskc2/storage (arbiter) Brick31: storage11:/media/diskb/storage Brick32: storage12:/media/diskb/storage Brick33: storage2:/media/diskb2/storage (arbiter) Brick34: storage11:/media/diskc/storage Brick35: storage12:/media/diskc/storage Brick36: storage2:/media/diskc2/storage (arbiter) Brick37: storage13:/media/diskb/storage Brick38: storage14:/media/diskb/storage Brick39: storage4:/media/diskb2/storage (arbiter) Brick40: storage13:/media/diskc/storage Brick41: storage14:/media/diskc/storage Brick42: storage4:/media/diskc2/storage (arbiter) Options Reconfigured: performance.cache-refresh-timeout: 1 performance.io-thread-count: 16 client.event-threads: 2 server.event-threads: 2 diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.cache-size: 1GB cluster.min-free-disk: 4% transport.address-family: inet nfs.disable: on performance.readdir-ahead: on features.barrier: disable ``` **- The output of the `gluster volume status` command**: ```Status of volume: storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick storage1:/media/diskb/storage 49152 0 Y 21472 Brick storage2:/media/diskb/storage 49152 0 Y 2826 Brick storage3:/media/diskb2/storage 49156 0 Y 18773 Brick storage1:/media/diskc/storage 49153 0 Y 21481 Brick storage2:/media/diskc/storage 49153 0 Y 2835 Brick storage3:/media/diskc2/storage 49157 0 Y 18784 Brick storage3:/media/diskb/storage 49158 0 Y 18795 Brick storage4:/media/diskb/storage 49156 0 Y 30225 Brick storage5:/media/diskb2/storage 49156 0 Y 15781 Brick storage3:/media/diskc/storage 49159 0 Y 18804 Brick storage4:/media/diskc/storage 49157 0 Y 30234 Brick storage5:/media/diskc2/storage 49157 0 Y 15792 Brick storage5:/media/diskb/storage 49158 0 Y 15802 Brick storage6:/media/diskb/storage 49154 0 Y 10461 Brick storage7:/media/diskb2/storage 49156 0 Y 29665 Brick storage5:/media/diskc/storage 49159 0 Y 15810 Brick storage6:/media/diskc/storage 49155 0 Y 10475 Brick storage7:/media/diskc2/storage 49157 0 Y 29674 Brick storage7:/media/diskb/storage 49158 0 Y 29686 Brick storage8:/media/diskb/storage 49152 0 Y 5718 Brick storage9:/media/diskb2/storage 49156 0 Y 20526 Brick storage7:/media/diskc/storage 49159 0 Y 29694 Brick storage8:/media/diskc/storage 49153 0 Y 5725 Brick storage9:/media/diskc2/storage 49157 0 Y 20535 Brick storage9:/media/diskb/storage 49158 0 Y 20544 Brick storage10:/media/diskb/storage 49154 0 Y 19824 Brick storage1:/media/diskb2/storage 49154 0 Y 21493 Brick storage9:/media/diskc/storage 49159 0 Y 20555 Brick storage10:/media/diskc/storage 49155 0 Y 19831 Brick storage1:/media/diskc2/storage 49155 0 Y 21503 Brick storage11:/media/diskb/storage 49152 0 Y 2957 Brick storage12:/media/diskb/storage 49152 0 Y 2928 Brick storage2:/media/diskb2/storage 49154 0 Y 2846 Brick storage11:/media/diskc/storage 49153 0 Y 2964 Brick storage12:/media/diskc/storage 49153 0 Y 2929 Brick storage2:/media/diskc2/storage 49155 0 Y 2855 Brick storage13:/media/diskb/storage 49154 0 Y 1336 Brick storage14:/media/diskb/storage 49152 0 Y 15747 Brick storage4:/media/diskb2/storage 49158 0 Y 30244 Brick storage13:/media/diskc/storage 49155 0 Y 1343 Brick storage14:/media/diskc/storage 49153 0 Y 15754 Brick storage4:/media/diskc2/storage 49159 0 Y 30254 Self-heal Daemon on localhost N/A N/A Y 25505 Self-heal Daemon on storage2 N/A N/A Y 21353 Self-heal Daemon on storage4 N/A N/A Y 1868 Self-heal Daemon on storage6 N/A N/A Y 13309 Self-heal Daemon on storage5 N/A N/A Y 19099 Self-heal Daemon on storage7 N/A N/A Y 2134 Self-heal Daemon on storage3 N/A N/A Y 22856 Self-heal Daemon on storage9 N/A N/A Y 24537 Self-heal Daemon on storage12 N/A N/A Y 2948 Self-heal Daemon on storage14 N/A N/A Y 15767 Self-heal Daemon on storage8 N/A N/A Y 5738 Self-heal Daemon on storage11 N/A N/A Y 2977 Self-heal Daemon on storage13 N/A N/A Y 4835 Self-heal Daemon on storage10 N/A N/A Y 23235 Task Status of Volume storage ------------------------------------------------------------------------------ Task : Rebalance ID : c353ae2c-f31c-46bf-b5a7-f716efd76552 Status : completed ``` **- The output of the `gluster volume heal` command**: no bricks are showing any entries **- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/ **- Is there any crash ? Provide the backtrace and coredump See Strace info attached **Additional info:**

**- The operating system / glusterfs version**: Ubuntu 18.04 **Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration**

pranithk commented 4 years ago

As per strace these latencies are indeed stemming from XFS

pranith.karampuri@PP-CNPZ5M2 ~/D/strace> grep lgetxattr strace.15* | awk '{print $NF}' | cut -f2 -d'<' | cut -f1 -d'>' | sort -n | tail -5
0.082245
0.084232
0.095902
0.109003
0.216018
pranith.karampuri@PP-CNPZ5M2 ~/D/strace> grep lstat strace.15* | awk '{print $NF}' | cut -f2 -d'<' | cut -f1 -d'>' | sort -n | tail -5
0.108165
0.111372
0.129129
0.153885
0.350215

matclayton commented 4 years ago

We're also seeing the following distribution on the XFS layer

`disk = 'sdb' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 18 | | 4 -> 7 : 48 | | 8 -> 15 : 123 | | 16 -> 31 : 111 | | 32 -> 63 : 114 | | 64 -> 127 : 2940 |* | 128 -> 255 : 6369 |*** | 256 -> 511 : 7613 |****| 512 -> 1023 : 917 | | 1024 -> 2047 : 1128 |* | 2048 -> 4095 : 421 |* | 4096 -> 8191 : 370 | | 8192 -> 16383 : 2059 |** | 16384 -> 32767 : 2615 |***** | 32768 -> 65535 : 108 | | 65536 -> 131071 : 24 | | 131072 -> 262143 : 49 | | 262144 -> 524287 : 12 | |

disk = 'sdc' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 25 | | 4 -> 7 : 50 | | 8 -> 15 : 16 | | 16 -> 31 : 3 | | 32 -> 63 : 78 | | 64 -> 127 : 2784 |*** | 128 -> 255 : 7109 |****| 256 -> 511 : 6120 |** | 512 -> 1023 : 1474 |**** | 1024 -> 2047 : 1715 |* | 2048 -> 4095 : 871 |** | 4096 -> 8191 : 504 | | 8192 -> 16383 : 2558 |** | 16384 -> 32767 : 3485 |*** | 32768 -> 65535 : 161 | | 65536 -> 131071 : 79 | | 131072 -> 262143 : 61 | | 262144 -> 524287 : 46 | |`

hunter86bg commented 4 years ago

@matclayton ,

did you align your HW raid with the PVs, VGs and XFS properly ? The proper alignment is described in chapter 19: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/chap-configuring_red_hat_storage_for_enhancing_performance

pranithk commented 4 years ago

@hunter86bg I debugged this along with @matclayton. After debugging this for 2 days, we think this issue is happening because of https://bugzilla.redhat.com/show_bug.cgi?id=1676479. @matclayton will confirm after monitoring their cluster for a while.

matclayton commented 4 years ago

@pranithk Thank you for the help on this, it was a pain to track down, I can confirm that things appear to still be stable on the read path. We've had some complaints about writes, but have yet to confirm these. I suspect its a seperate issue, or delayed reports coming in, we'll continue to monitor.

To confirm the performance issue was resolved by running

gluster volume set <volname> performance.io-cache off gluster volume set <volname> performance.read-ahead off

and since then it has been plain sailing. I'll close this issue, as it doesn't represent the problem we observed.

gluster / glusterfs

Performance issues on backend xfs bricks #1787