gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.51k stars 1.07k forks source link

Client issue: client-rpc-fops_v2.c:2618:client4_0_lookup_cbk - remote operation failed - error=No data available + periodic loss of mount on native client #4307

Open intelliIT opened 3 months ago

intelliIT commented 3 months ago

Description of problem: im pretty new to all of this glusterfs and linux server administration stuff, so sorry already.

we are running a distributed-replicated across 4 servers with 4 bricks total. backed by 10G connections. they host 2 types of these volumes, one with fast ssd storage, a second with slower hdd storage. i applied some tuning options i found in the official docs and some other guides. i tried to improve performance, stability and to reduce the size of the log files.

we have 3 other machines which mount the volumes native to use them as a shared storage in addition to the 4 servers themself which mount the volumes native aswell.

i am seeing a ton of these errors in the client-volume logs. only for the faster volume. the slower one seems to be ok. its not always the same file path, but it is always this block. gfid followed by file path. but only on the 3 client machines. the files in question might be of very short lifetime, could this be an issue?

[2024-02-12 10:06:36.914704 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2618:client4_0_lookup_cbk] 0-fast_volume_native-client-1: remote operation failed. [{path=(null)}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}]
[2024-02-12 10:06:36.915051 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2618:client4_0_lookup_cbk] 0-fast_volume_native-client-1: remote operation failed. [{path=/***-main/****/in/I***/84100123456_B_01_2d21b241-3c5b-4a19-b37d-387dfe545d7d.pdf}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}]

repeated 278943 times between [2024-02-12 10:15:36.914792 +0000] and [2024-02-12 10:18:36.914127 +0000]
repeated 92980 times between [2024-02-12 10:15:36.916535 +0000] and [2024-02-12 10:18:36.913836 +0000]

they keep growing several gigabytes in a matter of hours, so i adjusted logrotate. if i set the log-level to DEBUG this happens in a matter of minutes.

also if i 'ls' the directory on the bricks or the gluster mount on the servers themself, the directory is empty. if i do the same operation on the 3 native clients the ls hangs and doesnt accept any SIG from my tty. a strace on the 'ls' stops aswell:

``` brk(NULL) = 0x555f95321000 arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd6578cd30) = -1 EINVAL (Invalid argument) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d4a56e000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=21115, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 21115, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4d4a568000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=166280, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 177672, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4d4a53c000 mprotect(0x7f4d4a542000, 139264, PROT_NONE) = 0 mmap(0x7f4d4a542000, 106496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f4d4a542000 mmap(0x7f4d4a55c000, 28672, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x7f4d4a55c000 mmap(0x7f4d4a564000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x27000) = 0x7f4d4a564000 mmap(0x7f4d4a566000, 5640, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4d4a566000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48 pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\302\211\332Pq\2439\235\350\223\322\257\201\326\243\f"..., 68, 896) = 68 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2220400, ...}, AT_EMPTY_PATH) = 0 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 mmap(NULL, 2264656, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4d4a313000 mprotect(0x7f4d4a33b000, 2023424, PROT_NONE) = 0 mmap(0x7f4d4a33b000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f4d4a33b000 mmap(0x7f4d4a4d0000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f4d4a4d0000 mmap(0x7f4d4a529000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x215000) = 0x7f4d4a529000 mmap(0x7f4d4a52f000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4d4a52f000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=613064, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 615184, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4d4a27c000 mmap(0x7f4d4a27e000, 438272, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f4d4a27e000 mmap(0x7f4d4a2e9000, 163840, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6d000) = 0x7f4d4a2e9000 mmap(0x7f4d4a311000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x94000) = 0x7f4d4a311000 close(3) = 0 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d4a279000 arch_prctl(ARCH_SET_FS, 0x7f4d4a279800) = 0 set_tid_address(0x7f4d4a279ad0) = 2716694 set_robust_list(0x7f4d4a279ae0, 24) = 0 rseq(0x7f4d4a27a1a0, 0x20, 0, 0x53053053) = 0 mprotect(0x7f4d4a529000, 16384, PROT_READ) = 0 mprotect(0x7f4d4a311000, 4096, PROT_READ) = 0 mprotect(0x7f4d4a564000, 4096, PROT_READ) = 0 mprotect(0x555f93f9d000, 4096, PROT_READ) = 0 mprotect(0x7f4d4a5a8000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7f4d4a568000, 21115) = 0 statfs("/sys/fs/selinux", 0x7ffd6578cd70) = -1 ENOENT (No such file or directory) statfs("/selinux", 0x7ffd6578cd70) = -1 ENOENT (No such file or directory) getrandom("\x64\x36\xf2\x1c\x21\x75\xe5\x7f", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x555f95321000 brk(0x555f95342000) = 0x555f95342000 openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0 read(3, "nodev\tsysfs\nnodev\ttmpfs\nnodev\tbd"..., 1024) = 421 read(3, "", 1024) = 0 close(3) = 0 access("/etc/selinux/config", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=3048928, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 3048928, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4d49f90000 close(3) = 0 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TIOCGWINSZ, {ws_row=78, ws_col=286, ws_xpixel=0, ws_ypixel=0}) = 0 openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d49f6f000 getdents64(3, 0x7fd9a8b45040 /* 1 entries */, 131072) = 80 getdents64(3, 0x7fd9a8b45040 /* 2 entries */, 131072) = 48 getdents64(3, //stops and hang here ```

in addition to this one of the 3 machines loses the mount to the fast volume quite often, which renders the services depending on it unusable. only a restart brings back the mount. this does not happen to any of the other machines. im struggling to find the corresponding logs for the event because of the insane amount of messages in the logfile. if someone can point me to a possible output in any of the logs i could look for it. this is the mount command i use:

node1:/fast_volume_**/** /mnt/fast_volume glusterfs defaults,_netdev,backup-volfile-servers=node4:node2:node3 0 0

my volumes seem to be fine, there is no excessive healing going on too.

also we currently have hardware issues, as one of the servers is rebooting unexpectedly once a day. spare parts are ordered.

also i want to note that apparently everything is working well. we almost never have corrupted files or cant access data. except for the above example. we had 3 or 4 similar files/directories to date. solely the size of the client logs is insane and the periodic loss of the native client mount on one of the machines is annoying.

Expected results: are there any steps i can take to get rid of these log messages, or rather fix the client/volumes? and to improve the stability of the client(s).

Mandatory info:

Volume Name: fast_volume_native
Type: Distributed-Replicate
Volume ID: 4ca790c0-0f7f-4233-a4bd-595247884***
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: node1:/data/glustervolume_fast***
Brick2: node2:/data/glustervolume_fast***
Brick3: node3:/data/glustervolume_fast***
Brick4: node4:/data/glustervolume_fast***
Options Reconfigured:
transport.keepalive: 1
performance.qr-cache-timeout: 600
network.ping-timeout: 42
diagnostics.client-log-level: WARNING
diagnostics.client-log-buf-size: 15
diagnostics.client-log-flush-timeout: 180
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000
performance.cache-samba-metadata: on
performance.cache-size: 4000MB
performance.cache-max-file-size: 150MB
performance.flush-behind: on
performance.quick-read: off
performance.cache-refresh-timeout: 10
performance.write-behind-window-size: 4MB
performance.io-thread-count: 32

Volume Name: slow_volume_native
Type: Distributed-Replicate
Volume ID: 75730dc9-9734-4106-851a-b81878d7c***
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: node1:/data/glustervolume_slow**
Brick2: node2:/data/glustervolume_slow**
Brick3: node3:/data/glustervolume_slow**
Brick4: node4:/data/glustervolume_slow**
Options Reconfigured:
performance.qr-cache-timeout: 600
network.ping-timeout: 42
transport.keepalive: 1
diagnostics.client-log-level: WARNING
diagnostics.client-log-buf-size: 15
diagnostics.client-log-flush-timeout: 180
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000
performance.cache-samba-metadata: on
performance.cache-size: 4000MB
performance.cache-max-file-size: 150MB
performance.flush-behind: on
performance.quick-read: off
performance.cache-refresh-timeout: 10
performance.write-behind-window-size: 4MB
performance.io-thread-count: 32
Status of volume: fast_volume_native
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/data/glustervolume_fast***  57698     0          Y       2196209
Brick node2:/data/glustervolume_fast***  52192     0          Y       3054
Brick node3:/data/glustervolume_fast***  57749     0          Y       2997
Brick node4:/data/glustervolume_fast***  51420     0          Y       4790
Self-heal Daemon on localhost               N/A       N/A        Y       2196245
Self-heal Daemon on node4                   N/A       N/A        Y       4814
Self-heal Daemon on node2                   N/A       N/A        Y       3202
Self-heal Daemon on node3                   N/A       N/A        Y       4069

Task Status of Volume fast_volume_native
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: slow_volume_native
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/data/glustervolume_slow***   54731     0          Y       2197122
Brick node2:/data/glustervolume_slow***   58695     0          Y       3179
Brick node3:/data/glustervolume_slow***   49747     0          Y       3948
Brick node4:/data/glustervolume_slow***   57440     0          Y       5014
Self-heal Daemon on localhost               N/A       N/A        Y       2196245
Self-heal Daemon on node2                   N/A       N/A        Y       3202
Self-heal Daemon on node4                   N/A       N/A        Y       4814
Self-heal Daemon on node3                   N/A       N/A        Y       4069

Task Status of Volume slow_volume_native
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

Brick node1:/data/glustervolume_fast***
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick node2:/data/glustervolume_fast***
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick node3:/data/glustervolume_fast***
Status: Connected
Total Number of entries: 5
Number of entries in heal pending: 4
Number of entries in split-brain: 1
Number of entries possibly healing: 0

Brick node4:/data/glustervolume_fast***
Status: Connected
Total Number of entries: 3
Number of entries in heal pending: 2
Number of entries in split-brain: 1
Number of entries possibly healing: 0

- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

this is the log of the brick on node1: it gets repeated quite often.

[2024-02-12 11:26:32.971824 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-fast_volume_native-server: disconnecting connection [{client-uid=CTX_ID:5135801a-ce7a-436d-9cb6-c1**-GRAPH_ID:0-PID:3780927-HOST:ryzen3-PC_NAME:fast_volume_native-client-0-RECON_NO:-0}]
[2024-02-12 11:26:32.971920 +0000] I [MSGID: 101055] [client_t.c:396:gf_client_unref] 0-fast_volume_native-server: Shutting down connection CTX_ID:5135801a-ce7a-436d-9cb6-c1**-GRAPH_ID:0-PID:3780927-HOST:ryzen3-PC_NAME:fast_volume_native-client-0-RECON_NO:-0
[2024-02-12 11:32:49.897182 +0000] I [glusterfsd-mgmt.c:77:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2024-02-12 11:32:49.903431 +0000] I [glusterfsd-mgmt.c:77:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2024-02-12 11:32:50.083731 +0000] I [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: node4:24007 node3:24007 node2:24007
[2024-02-12 11:32:50.083747 +0000] I [MSGID: 101221] [common-utils.c:3836:gf_set_volfile_server_common] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File exists}]
[2024-02-12 11:32:50.083763 +0000] I [glusterfsd-mgmt.c:2236:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing
[2024-02-12 11:32:50.084047 +0000] I [glusterfsd-mgmt.c:2203:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: node4:24007 node3:24007 node2:24007
[2024-02-12 11:32:50.084060 +0000] I [glusterfsd-mgmt.c:2236:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing
The message "I [MSGID: 101221] [common-utils.c:3836:gf_set_volfile_server_common] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File exists}]" repeated 5 times between [2024-02-12 11:32:50.083747 +0000] and [2024-02-12 11:32:50.084058 +0000]

in glustershd i see this message from time to time (hourly):

[2024-02-12 09:05:07.001382 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2618:client4_0_lookup_cbk] 0-fast_volume_native-client-1: remote operation failed. [{path=<gfid:36b05708-0ec0-4cae-bc58-08f1602e8dae>}, {gfid=36b05708-0ec0-4cae-bc58-08f1602e8dae}, {errno=2}, {error=No such file or directory}]

in glusterd i find these hourly too:

[2024-02-12 10:08:36.620178 +0000] I [MSGID: 106061] [glusterd-utils.c:10725:glusterd_volume_status_copy_to_op_ctx_dict] 0-management: Dict get failed [{Key=count}]
[2024-02-12 10:08:36.620328 +0000] I [MSGID: 106499] [glusterd-handler.c:4372:__glusterd_handle_status_volume] 0-management: Received status volume req for volume fast_volume_native
[2024-02-12 10:08:36.621379 +0000] I [MSGID: 106499] [glusterd-handler.c:4372:__glusterd_handle_status_volume] 0-management: Received status volume req for volume slow_volume_native

Additional info:

- The operating system / glusterfs version:

client:
glusterfs-client/jammy-updates,jammy-security,now 10.1-1ubuntu0.2 amd64
Ubuntu 22.04.3 LTS

server:
glusterfs 10.3
glusterfs-client/stable,now 10.3-5 amd64
Debian GNU/Linux 12 (bookworm)