gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.71k stars 1.08k forks source link

remote operation failed. [{errno=28}, {error=No space left on device}] #4220

Open devinfra1 opened 1 year ago

devinfra1 commented 1 year ago

I am having the below error in my glusterfs. I'm currently using version 11, on Ubuntu 22.04 Jammy.

[2023-09-05 14:31:29.738488 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-0: remote operation failed. [{errno=28}, {error=No space left on device}]
[2023-09-05 14:31:29.738517 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-2: remote operation failed. [{errno=28}, {error=No space left on device}]
[2023-09-05 14:31:29.738746 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-2: remote operation failed. [{errno=28}, {error=No space left on device}]
[2023-09-05 14:31:29.738966 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-0: remote operation failed. [{errno=28}, {error=No space left on device}]
[2023-09-05 14:31:29.741567 +0000] W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 4c7b86f9-927b-47a4-abff-7d3233f063d8: Failing WRITE as quorum is not met [No space left on device]
[2023-09-05 14:31:29.749994 +0000] W [fuse-bridge.c:1969:fuse_err_cbk] 0-glusterfs-fuse: 6087675: FLUSH() ERR => -1 (No space left on device)
The message "W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 4c7b86f9-927b-47a4-abff-7d3233f063d8: Failing WRITE as quorum is not met [No space left on device]" repeated 3 times between [2023-09-05 14:31:29.741567 +0000] and [2023-09-05 14:31:29.748871 +0000]
[2023-09-05 14:31:30.531207 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-2: remote operation failed. [{errno=28}, {error=No space left on device}]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-2: remote operation failed. [{errno=28}, {error=No space left on device}]" repeated 4 times between [2023-09-05 14:31:30.531207 +0000] and [2023-09-05 14:31:30.532133 +0000]
[2023-09-05 14:31:30.532751 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-0: remote operation failed. [{errno=28}, {error=No space left on device}]
[2023-09-05 14:31:30.533185 +0000] W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 883129e4-ea78-4001-9b21-692d2a358262: Failing WRITE as quorum is not met [No space left on device]
[2023-09-05 14:31:30.533337 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-0: remote operation failed. [{errno=28}, {error=No space left on device}]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-0: remote operation failed. [{errno=28}, {error=No space left on device}]" repeated 3 times between [2023-09-05 14:31:30.533337 +0000] and [2023-09-05 14:31:30.533810 +0000]
[2023-09-05 14:31:30.536060 +0000] W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 883129e4-ea78-4001-9b21-692d2a358262: Failing WRITE as quorum is not met [No space left on device]
[2023-09-05 14:31:30.541780 +0000] W [fuse-bridge.c:1969:fuse_err_cbk] 0-glusterfs-fuse: 6087709: FLUSH() ERR => -1 (No space left on device)
[2023-09-05 14:30:54.343397 +0000] W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 921e5b98-7a28-4936-af33-27a2729191a1: Failing WRITE as quorum is not met [No space left on device]
[2023-09-05 14:30:55.051990 +0000] W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 721a7b17-fb2d-4c4f-ba71-fd86f1260485: Failing WRITE as quorum is not met [No space left on device]
The message "W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 883129e4-ea78-4001-9b21-692d2a358262: Failing WRITE as quorum is not met [No space left on device]" repeated 3 times between [2023-09-05 14:31:30.536060 +0000] and [2023-09-05 14:31:30.541345 +0000]
[2023-09-05 14:46:12.784348 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-2: remote operation failed. [{errno=28}, {error=No space left on device}]
[2023-09-05 14:46:12.784431 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:712:client4_0_writev_cbk] 0-docker-shared-client-0: remote operation failed. [{errno=28}, {error=No space left on device}]
[2023-09-05 14:46:12.784471 +0000] W [MSGID: 108001] [afr-transaction.c:1020:afr_handle_quorum] 0-docker-shared-replicate-0: 978f024f-a06e-43f3-bd41-1179fa24e283: Failing WRITE as quorum is not met [No space left on device]
[2023-09-05 14:46:12.785173 +0000] W [fuse-bridge.c:1969:fuse_err_cbk] 0-glusterfs-fuse: 6088580: FLUSH() ERR => -1 (No space left on device)
[2023-09-05 16:32:29.419586 +0000] I [MSGID: 108031] [afr-common.c:3167:afr_local_discovery_cbk] 0-docker-shared-replicate-0: selecting local read_child docker-shared-client-2
[2023-09-05 17:26:25.888648 +0000] I [glusterfsd-mgmt.c:35:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2023-09-05 17:26:25.916118 +0000] I [glusterfsd-mgmt.c:2336:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile 

I have 150gb available on my servers currently. Which are acting with 3 gluster bricks. According to the 'gluster volume info' command below:

df -h

Filesystem                           Size  Used Avail Use% Mounted on
tmpfs                                1.6G  2.2M  1.6G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--root   68G  7.6G   61G  12% /
tmpfs                                7.9G     0  7.9G   0% /dev/shm
tmpfs                                5.0M     0  5.0M   0% /run/lock
/dev/mapper/ubuntu--vg-lv--home       10G  104M  9.9G   2% /home
/dev/sda2                            2.0G  253M  1.6G  14% /boot
/dev/mapper/ubuntu--vg-lv--var--log   20G  596M   20G   3% /var/log
/dev/mapper/docker--vg-lv--docker    299G  172G  128G  58% /var/lib/docker
localhost:docker-shared              299G  179G  121G  60% /var/lib/docker/data
overlay                              299G  172G  128G  58% /var/lib/docker/overlay2/b3ce33e544ec43bf680f05194f38363362c6534c2cc596bee647da3b366a0d93/merged
overlay                              299G  172G  128G  58% /var/lib/docker/overlay2/5d5c2553c57c5a1afc283d7a81dd296887fabf39bf00cf6e20b75d934ab2035f/merged
overlay                              299G  172G  128G  58% /var/lib/docker/overlay2/8c8fc73b445bf41f189d34e10e729722b168e3eb26e597181c76710f771ca5f5/merged
overlay                              299G  172G  128G  58% /var/lib/docker/overlay2/f0cf651666925797c5e50202542b4a97679898092e1cfe44dc049fb0473d7d0d/merged
overlay                              299G  172G  128G  58% /var/lib/docker/overlay2/9497ca22eff1fe769d3e1b867709c86ebe6f668ed9de64d0228e453d654f9ddb/merged
tmpfs                                1.6G  4.0K  1.6G   1% /run/user/0

gluster volume info


Volume Name: docker-shared
Type: Distributed-Replicate
Volume ID: fa50872f-ec54c-gt14-b62c-1drtg3a153b4d1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: srv-servicos-r1:/var/lib/docker/shared
Brick2: srv-servicos-r2:/var/lib/docker/shared
Brick3: srv-servicos-r3:/var/lib/docker/shared
Options Reconfigured:
performance.cache-size: 32
storage.reserve: 1
transport.address-family: inet
storage.fips-mode-rchecksum: on
network.ping-timeout: 5
performance.stat-prefetch: off
performance.client-io-threads: off
icolombi commented 1 year ago

It seems the same problem as https://github.com/gluster/glusterfs/issues/4157 and/or https://github.com/gluster/glusterfs/issues/4135

devinfra1 commented 1 year ago

Yes, I already checked the reported problems that you went through, but none solved my problem.

Changed the values ​​of 'storage.reserve' and 'cluster.min-free-disk' but we still have the same problem.

Where the same is resolved only when executing a stop|start on my gluster volume, but it occurs again after a certain time. I couldn't find anything in the logs that could be causing this problem.

baskinsy commented 1 year ago

Last version without issues was 10.3 for us. We are hitting this on 10.4 (on ubuntu 20.04) constantly and still no update or any comment.