gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.51k stars 1.07k forks source link

Strange issue with permission denied and bizarre mtime #4314

Open Nuitari opened 2 months ago

Nuitari commented 2 months ago

Description of problem:

Randomly we'll start getting permission denied errors accompanied by strange mtimes on the fuse mount.

We could not find a way to reproduce the problem, and it happens on directories that has been present for multiple years.

The symptom are always similar in that the Modified Time for the directory is set to some bizarre, inaccurate year:

From the FUSE mount point:

$ stat folder1
  File: folder1
  Size: 4096            Blocks: 8          IO Block: 131072 directory
Device: 36h/54d Inode: 12741275528126725710  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 1969-12-31 19:00:00.074592330 -0500
Modify: 4455343-06-23 16:55:20.000032721 -0400
Change: 1969-12-31 19:00:00.000000000 -0500
 Birth: -

$ ls -l folder1
ls: cannot open directory 'folder1': Permission denied

$ sudo ls -la folder1
total 10
drwxr-xr-x 2 www-data www-data 4096 Jun 23  4455343 .
drwxr-xr-x 3 www-data www-data 4096 May  6  4455343 ..
-rw-r--r-- 1 www-data www-data  170 Sep 30  2022 skin-bootstrap4.css
-rw-r--r-- 1 www-data www-data  904 Sep 30  2022 skin.css

From the Brick folder (independant of the brick)

$ ls -la /var/brick/folder1
total 32
drwxr-xr-x 2 www-data www-data 4096 May 10  2446 .
drwxr-xr-x 3 www-data www-data 4096 May 10  2446 ..
-rw-r--r-- 2 www-data www-data  170 Sep 30  2022 skin-bootstrap4.css
-rw-r--r-- 2 www-data www-data  904 Sep 30  2022 skin.css

$ stat /var/brick/folder1
  File: /var/brick/folder1  Size: 4096            Blocks: 16         IO Block: 4096   directory
Device: fc03h/64515d    Inode: 3018351     Links: 2
Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 1969-12-31 19:00:00.074592330 -0500
Modify: 2446-05-10 18:38:55.000000000 -0400
Change: 2024-01-31 01:33:04.035612866 -0500
 Birth: -

In the logs we see:

[2024-03-14 22:14:12.193346] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-1: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.193739] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-3: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196071] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-7: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196081] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-6: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196227] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-5: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196258] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-4: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196361] W [fuse-bridge.c:1513:fuse_fd_cbk] 0-glusterfs-fuse: 1842104: OPENDIR() /folder1 => -1 (Permission denied)

Doing sudo touch resets the timestamp and the directories are now accessible again.

Expected results: Access as a normal user

Mandatory info: - The output of the gluster volume info command:

Volume Name: sharedProd
Type: Replicate
Volume ID: 5955f185-5008-42cf-9cf2-aceff041c8f2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 9 = 9
Transport-type: tcp
Bricks:
Brick1: srv1:/var/brick
Brick2: srv2:/var/brick
Brick3: srv3:/var/brick
Brick4: srv4:/var/brick
Brick5: srv5:/var/brick
Brick6: srv6:/var/brick
Brick7: srv7:/var/brick
Brick8: srv8:/var/brick
Brick9: srv9:/var/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: true
storage.fips-mode-rchecksum: on
transport.address-family: inet
auth.allow: (lots of private IPs)
network.ping-timeout: 5
features.cache-invalidation: off
features.cache-invalidation-timeout: 60
performance.stat-prefetch: on
performance.cache-invalidation: false
performance.md-cache-timeout: 1
network.inode-lru-limit: 200000
cluster.shd-max-threads: 8
disperse.shd-wait-qlength: 2048
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on

- The output of the gluster volume status command:

Status of volume: sharedProd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick srv1:/var/brick  49152     0          Y       2287 
Brick srv2:/var/brick  49152     0          Y       2192 
Brick srv3:/var/brick 49152     0          Y       2393 
Brick srv4:/var/brick 49152     0          Y       3053 
Brick srv5:/var/brick 49152     0          Y       4698 
Brick srv6:/var/brick 49152     0          Y       1270 
Brick srv7:/var/brick                                           49152     0          Y       1920 
Brick srv8:/var/brick                                          49152     0          Y       1876 
Brick srv9:/var/brick     60618     0          Y       1473668
Self-heal Daemon on localhost               N/A       N/A        Y       2298 
Self-heal Daemon on srv9            N/A       N/A        Y       1473685
Self-heal Daemon on srv3                                         N/A       N/A        Y       2404 
Self-heal Daemon on srv2                                          N/A       N/A        Y       2203 
Self-heal Daemon on srv7           N/A       N/A        Y       1931 
Self-heal Daemon on srv4                                         N/A       N/A        Y       3064 
Self-heal Daemon on srv6                                         N/A       N/A        Y       1324 
Self-heal Daemon on srv5                                         N/A       N/A        Y       4709 
Self-heal Daemon on srv8                                      N/A       N/A        Y       1887 

Task Status of Volume sharedProd
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

Usage:
volume heal <VOLNAME> [enable | disable | full |statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [summary | split-brain] |split-brain {bigger-file <FILE> | latest-mtime <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]} |granular-entry-heal {enable | disable}]

**- Is there any crash ? Provide the backtrace and coredump No crash, no coredumps

- The operating system / glusterfs version: Mix of Ubuntu 20.04 and Ubuntu 22.04 glusterfs 10.1 on Ubuntu 22.04 glusterfs 7.2 on Ubuntu 20.04

The issue happens the same on either versions.

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

aravindavk commented 2 months ago

Number of Bricks: 1 x 9 = 9

Volume type looks something wrong. Did you created the volume with replica count 9? or wanted to create distributed replicate with replica count 3?

Please share the Volume create command used here.

Use the below command to create Distributed Replicate volume with Replica count 3

gluster volume create sharedProd replica 3 \
    srv1:/var/brick                        \
    srv2:/var/brick                        \
    srv3:/var/brick                        \
    srv4:/var/brick                        \
    srv5:/var/brick                        \
    srv6:/var/brick                        \
    srv7:/var/brick                        \
    srv8:/var/brick                        \
    srv9:/var/brick
Nuitari commented 2 months ago

The goal is to have 9 replicas. There is only about 20Gb of data, but we need high availability.

aravindavk commented 2 months ago

This is not a supported configuration, Only Replica count 2 and 3 are the tested and supported ones. You can explore Disperse volume where you will get the high availability and more storage space with the same number of bricks. For example, Create a volume with 6 data bricks and 3 redundancy bricks. Your volume size will be 6 x size in each brick and the volume will be highly available even if 3 nodes/bricks goes down.

@xhernandez / @pranithk Is it possible to have redundancy count more than data bricks if high availability is more important than storage space?

xhernandez commented 1 month ago

On Sat, Mar 16, 2024 at 2:39 PM Aravinda VK @.***> wrote:

This is not a supported configuration, Only Replica count 2 and 3 are the tested and supported ones. You can explore Disperse volume where you will get the high availability and more storage space with the same number of bricks. For example, Create a volume with 6 data bricks and 3 redundancy bricks. Your volume size will be 6 x size in each brick and the volume will be highly available even if 3 nodes/bricks goes down.

@xhernandez https://github.com/xhernandez / @pranithk https://github.com/pranithk Is it possible to have redundancy count more than data bricks if high availability is more important than storage space?

No. It's not possible. The number of data bricks is enforced to always be greater than half of the total bricks to have a way to guarantee the quorum. In this case the maximum redundancy configuration would be 5 + 4.

A thing to consider is that dispersed volumes require more computational power to encode/decode the data, and the performance could differ compared to a replicated volume (in some workloads it could be better and in some slower). Some testing should be done to be sure everything is inside the allowed tolerance if they want to go with dispersed volumes.

Xavi

— Reply to this email directly, view it on GitHub https://github.com/gluster/glusterfs/issues/4314#issuecomment-2001990048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANS6GFPZX6ZJ7EUE6ZG2ETYYRDSHAVCNFSM6AAAAABEW6ZVOOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBRHE4TAMBUHA . You are receiving this because you were mentioned.Message ID: @.***>

Nuitari commented 1 month ago

We also have a smaller testing environment

Volume Name: shared1
Type: Replicate
Volume ID: 2073f548-b89a-4687-92f6-486ac661750b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: testsrv1:/var/brick
Brick2: testsrv2:/var/brick
Brick3: testsrv3:/var/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: true
storage.fips-mode-rchecksum: on
transport.address-family: inet
auth.allow: 10.0.0.0/8

Same problem presentation. All 3 nodes are glusterfs 10.1 on Ubuntu 22.04