gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.51k stars 1.07k forks source link

Errors creating files via qemu-img / libgfapi #4325

Open resposit opened 1 month ago

resposit commented 1 month ago

Description of problem:

I randomly get errors when trying to convert a .img file to .qcow2. The .img file is on a local disk. Destination .qcow2 is on a gluster disperse volume with sharding enabled.

root@cloud15:~# du -sh /home/jammy-server-cloudimg-amd64-disk-kvm.img
589M    /home/jammy-server-cloudimg-amd64-disk-kvm.img

root@cloud15:~# qemu-img convert -f raw -O qcow2 /home/jammy-server-cloudimg-amd64-disk-kvm.img gluster://cloud15-gl.na.infn.it/vstor/jammy-server-cloudimg-amd64-disk-kvm.qcow2
[2024-03-27 11:59:02.261553 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vstor: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-27 11:59:14.267260 +0000] I [io-stats.c:4190:fini] 0-vstor: io-stats translator unloaded
[2024-03-27 11:59:15.273044 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vstor: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-27 11:59:27.278484 +0000] I [io-stats.c:4190:fini] 0-vstor: io-stats translator unloaded
[2024-03-27 11:59:27.284414 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vstor: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-27 11:59:29.236878 +0000] E [MSGID: 122037] [ec-common.c:2346:ec_update_size_version_done] 0-vstor-disperse-0: Failed to update version and size. FOP : 'XATTROP' failed on '/.shard' with gfid be318638-e8a0-4c6d-977d-7a937aa84806. Parent FOP: MKNOD [Input/output error]
[2024-03-27 11:59:39.289442 +0000] I [io-stats.c:4190:fini] 0-vstor: io-stats translator unloaded

Expected behavior This would be the expected output:

root@cloud15:~# qemu-img convert -f raw -O qcow2 /home/jammy-server-cloudimg-amd64-disk-kvm.img gluster://cloud15-gl.na.infn.it/vstor/jammy-server-cloudimg-amd64-disk-kvm.qcow2
[2024-03-27 12:20:25.009488 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vstor: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-27 12:20:37.015261 +0000] I [io-stats.c:4190:fini] 0-vstor: io-stats translator unloaded
[2024-03-27 12:20:38.020750 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vstor: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-27 12:20:50.026104 +0000] I [io-stats.c:4190:fini] 0-vstor: io-stats translator unloaded
[2024-03-27 12:20:51.031591 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vstor: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-27 12:21:03.036862 +0000] I [io-stats.c:4190:fini] 0-vstor: io-stats translator unloaded

Apparently the errors appear only when sharding is enabled.

Mandatory info:

My /etc/hosts

10.10.81.15 cloud15-gl.na.infn.it cloud15-gl
10.10.81.16 cloud16-gl.na.infn.it cloud16-gl
10.10.81.17 cloud17-gl.na.infn.it cloud17-gl

- The output of the gluster volume info command:

Volume Name: vstor
Type: Disperse
Volume ID: c3d389ed-3d51-4484-bdfd-596d2629a0a1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: cloud15-gl:/pool6/vstor/brick
Brick2: cloud16-gl:/pool6/vstor/brick
Brick3: cloud17-gl:/pool6/vstor/brick
Options Reconfigured:
features.shard-block-size: 256MB
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
features.shard: on
user.cifs: off
client.event-threads: 4
server.event-threads: 4
performance.client-io-threads: on
cluster.lookup-optimize: off

- The output of the gluster volume status command:

Status of volume: vstor
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick cloud15-gl:/pool6/vstor/brick         59554     0          Y       3111
Brick cloud16-gl:/pool6/vstor/brick         52485     0          Y       3104
Brick cloud17-gl:/pool6/vstor/brick         52170     0          Y       3116
Self-heal Daemon on localhost               N/A       N/A        Y       3165
Self-heal Daemon on cloud17-gl.na.infn.it   N/A       N/A        Y       3151
Self-heal Daemon on cloud16-gl.na.infn.it   N/A       N/A        Y       3142

Task Status of Volume vstor
------------------------------------------------------------------------------
There are no active volume tasks

Additional info:

- The operating system / glusterfs version: Debian 12.5 Glusterfs 11.1

Bricks are on ZFS file sytems:

root@cloud15:/mnt/pve/vstor# zpool status
  pool: pool6
 state: ONLINE
  scan: scrub repaired 0B in 00:00:01 with 0 errors on Mon Mar 18 11:42:02 2024
config:

        NAME                        STATE     READ WRITE CKSUM
        pool6                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            scsi-35000c500f64eeaf7  ONLINE       0     0     0
            scsi-35000c500f64f369b  ONLINE       0     0     0
            scsi-35000c500f65032fb  ONLINE       0     0     0
            scsi-35000c500c2254b77  ONLINE       0     0     0
            scsi-35000c500cf057b2b  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            scsi-358ce38ee22c8e51d  ONLINE       0     0     0
            scsi-358ce38ee22c8e519  ONLINE       0     0     0

errors: No known data errors