Open resposit opened 1 month ago
GFAPI based command (qemu-img create gluster:
) is mounting the volume and unmounting the volume along with the file creation. Fuse mount those two steps are skipped and only file create was done.
Please share the gfapi logs so that we can confirm the timings for each steps.
Thank you, I increased diagnostics.client-log-level to DEBUG. Here is the log: gluster-log.txt
Can you share the profile after execute a command like below
gluster v profile <vol-name> start; gluster v profile <vol-name> info clear; time qemu-img create /mnt/pve/vstor/10G 10G; gluster v profile <vol-name> info
GFAPI doesn't help for the stateless operations like create. qemu-img create
has to do init (mount) and fini (umount) along with the actual file creation. The mount state will not be maintained while running the next command. If you run qemu-img create
10 times then it has to mount and unmount the volume 10 times.
When the image is booted using qemu-img
command, then it is a long running process and the mount will be persisted as long as the image is running. During this time, all the IO operations use GFAPI so that it can be faster than fuse.
Fuse equivalent to the GFAPI code above is
# qemu_fuse_create.sh
mount -t glisters cloud15-gl.na.infn.it:vstor /mnt/pve/vstor
qemu-img create /mnt/pve/vstor/10G 10G
umount /mnt/pve/vstor
time bash qemu_fuse_create.sh
The actual image creation via gfapi is fast as we can see the connection was started around 12:47:24.708074 and shutdown was started around 12:47:24.713059 so the total time taken is around 4985 microsec that is faster than fuse has taken. It is showing slowness at application level because shutdown process is slow, we have to check it why it is taking time. I will try to update on the same if i will find something.
[2024-03-20 12:47:24.708074 +0000] I [MSGID: 122062] [ec.c:335:ec_up] 0-vstor-disperse-0: Going UP : Child UP = 111 Child Notify = 111
[2024-03-20 12:47:24.713059 +0000] I [socket.c:835:__socket_shutdown] 0-gfapi: intentional socket shutdown(11)
@aravindavk I tried what you suggest
root@cloud15:~# mkdir vstor-test
root@cloud15:~# cat qemu_fuse_create.sh
mount -t glusterfs cloud15-gl.na.infn.it:vstor ./vstor-test
qemu-img create ./vstor-test/10G 10G
umount ./vstor-test
Execution is almost istantaneous:
root@cloud15:~# time ./qemu_fuse_create.sh
Formatting './vstor-test/10G', fmt=raw size=10737418240
real 0m0.105s
user 0m0.058s
sys 0m0.037s
I think i have found a RCA why qemu-img is taking time during connection shutdown. During first fop initiation we do set a call_bail timer 10s and call_bail and take a reference on rpc object . The function(call_bail) is call by timer thread every 10s basis. During disconnect the client job is cancel the timer event and unref the rpc object but after this patch (https://review.gluster.org/#/c/glusterfs/+/22087/) we have changed the return code to -1 in gf_timer_call_cancel if cleanup is started. Because the function is returning -1 to the parent function rpc_clnt_connection_cleanup so it is not unref rpc object and the connection has to wait until call_bail function is triggered by timer thread even the job operation has finished completely. I think we have to change the return code to avoid an issue. Though the patch was implemented as a part of shd mulitplex feature even shd mux feature was reverted but the associated patch was not revert.
@rafikc30 Do you think any issue if we do change the return code to 0 while ctx is valid in the function gf_timer_call_cancel so that a parent function can uref rpc object without waiting for call_bail.
Description of problem:
Creating a raw/qcow2 on a gluster volume via libgfapi appears to be very slow compare to fuse access.
If I do the same operation directly on the fuse mount point, the operation is almost istantaneous:
Showing both files:
Mandatory info: - The output of the
gluster volume info
command:- The output of the
gluster volume status
command:Additional info:
- The operating system / glusterfs version: Debian 12.5 Glusterfs 11.1
Bricks are on ZFS file sytems: