gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.69k stars 1.08k forks source link

[bug:1751085] Gluster fuse mount crashed during truncate #1022

Closed gluster-ant closed 3 years ago

gluster-ant commented 4 years ago

URL: https://bugzilla.redhat.com/1751085 Creator: kdhananj at redhat Time: 20190911T07:16:39

Description of problem:

Gluster fuse mount crashes in shard translator while truncating the file from a really high size (Exabytes) to a really low size.

See bt: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=tendrl25.lab.eng.blr.r'. Program terminated with signal 11, Segmentation fault.

0 0x00007f01260a2eca in shard_common_resolve_shards (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60,

post_res_handler=0x7f01260ad770 <shard_post_resolve_truncate_handler>) at shard.c:1030

1030 local->inode_list[i] = inode_ref(res_inode); Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libselinux-2.5-14.1.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7_6.8.x86_64 (gdb) bt

0 0x00007f01260a2eca in shard_common_resolve_shards (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60,

post_res_handler=0x7f01260ad770 <shard_post_resolve_truncate_handler>) at shard.c:1030

1 0x00007f01260a3bfd in shard_refresh_internal_dir (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60, type=type@entry=SHARD_INTERNAL_DIR_DOT_SHARD) at shard.c:1317

2 0x00007f01260ad90d in shard_truncate_begin (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60) at shard.c:2596

3 0x00007f01260b506d in shard_post_lookup_truncate_handler (frame=0x7f0104000a58, this=0x7f0118015b60) at shard.c:2659

4 0x00007f01260a1f8b in shard_lookup_base_file_cbk (frame=0x7f0104000a58, cookie=, this=0x7f0118015b60, op_ret=, op_errno=,

inode=<optimized out>, buf=0x7f0104013bb0, xdata=0x7f010c00faf8, postparent=0x7f0104013c48) at shard.c:1702

5 0x00007f01265e4922 in afr_discover_unwind (frame=0x7f010400bd28, this=) at afr-common.c:3011

6 0x00007f01265e4eeb in afr_discover_done (frame=, this=) at afr-common.c:3106

7 0x00007f01265f27fd in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f010400bd28, this=this@entry=0x7f0118010da0) at afr-common.c:2761

8 0x00007f01265f3608 in afr_discover_cbk (frame=frame@entry=0x7f010400bd28, cookie=, this=, op_ret=, op_errno=,

inode=inode@entry=0x7f011805b9d8, buf=buf@entry=0x7f011f7fcb40, xdata=0x7f010c00faf8, postparent=postparent@entry=0x7f011f7fcbe0) at afr-common.c:3147

9 0x00007f012687c412 in client4_0_lookup_cbk (req=, iov=, count=, myframe=0x7f010c010838) at client-rpc-fops_v2.c:2641

10 0x00007f012ee6c021 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f01180510c0, pollin=pollin@entry=0x7f010c001a20) at rpc-clnt.c:755

11 0x00007f012ee6c387 in rpc_clnt_notify (trans=0x7f0118051380, mydata=0x7f01180510f0, event=, data=0x7f010c001a20) at rpc-clnt.c:922

12 0x00007f012ee689f3 in rpc_transport_notify (this=this@entry=0x7f0118051380, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f010c001a20) at rpc-transport.c:542

13 0x00007f0129778875 in socket_event_poll_in (notify_handled=true, this=0x7f0118051380) at socket.c:2522

14 socket_event_handler (fd=, idx=, gen=, data=0x7f0118051380, poll_in=, poll_out=, poll_err=0,

event_thread_died=0 '\000') at socket.c:2924

15 0x00007f012f126806 in event_dispatch_epoll_handler (event=0x7f011f7fd130, event_pool=0x55cbb4dc2300) at event-epoll.c:648

16 event_dispatch_epoll_worker (data=0x7f0118049a60) at event-epoll.c:761

17 0x00007f012df01dd5 in start_thread (arg=0x7f011f7fe700) at pthread_create.c:307

18 0x00007f012d7c902d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) p local->first_block $1 = 0 (gdb) p local->last_block $2 = -1 (gdb) p local->prebuf.ia_size $3 = 18446744073709547520 (gdb) p local->num_blocks $4 = 0 (gdb) p local->block_size $5 = 67108864 (gdb) p (local->prebuf.ia_size - 1)/local->block_size $6 = 274877906943 (gdb) p (int) $6 $7 = -1

Turns out the quotient resulting from division of a really high unsigned 64 int to a relatively low unsigned 64 int is assigned to a signed int32 variable. And sometimes this quotient is larger than the largest signed int 32. In this case, local->last_block gets assigned a '-1' and the resulting local->num_blocks after that becomes 0. This leads to a GF_CALLOC with size 0 of local->inode_list[] here in shard_truncate_begin():

2580 local->inode_list = GF_CALLOC(local->num_blocks, sizeof(inode_t *), 2581 gf_shard_mt_inode_list); 2582 if (!local->inode_list) 2583 goto err; 2584

When the members of local->inode_list[] beyond its boundary are accessed, there is illegal memory access and the process crashes.

While this explains the cause of the crash, what it doesn't explain is why the size of the file was as big as 16Eb when the writes on the file in question wouldn't extend its size beyond 4k. That is still being investigated. I'll update the bz once I have the RC.

Version-Release number of selected component (if applicable):

master

How reproducible:

Steps to Reproduce: 1. 2. 3.

Actual results:

Expected results:

Additional info:

gluster-ant commented 4 years ago

Time: 20190912T06:43:23 bshetty at redhat commented: Seeing a similar crash with below steps:

  1. Complete the RHHI-V deployment
  2. Create 30 vm's using pool
  3. After an hour or so could see the FUSE mount crash

(gdb) bt

0 0x00007ff30b541a1a in shard_common_resolve_shards (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, post_res_handler=0x7ff30b54b2d0 ) at shard.c:1030

#1  0x00007ff30b5425e5 in shard_refresh_internal_dir (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, type=type@entry=SHARD_INTERNAL_DIR_DOT_SHARD) at shard.c:1321
#2  0x00007ff30b54b46e in shard_truncate_begin (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470) at shard.c:2573
#3  0x00007ff30b551cc8 in shard_post_lookup_truncate_handler (frame=0x7ff2d400eba8, this=0x7ff304013470) at shard.c:2637
#4  0x00007ff30b5409f2 in shard_lookup_base_file_cbk (frame=0x7ff2d400eba8, cookie=<optimized out>, this=0x7ff304013470, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>,
    buf=0x7ff2d402f700, xdata=0x7ff2fc02fd48, postparent=0x7ff2d402f9a0) at shard.c:1705
#5  0x00007ff30b7a8381 in dht_discover_complete (this=this@entry=0x7ff304011cc0, discover_frame=discover_frame@entry=0x7ff2d400c028) at dht-common.c:754
#6  0x00007ff30b7a92d4 in dht_discover_cbk (frame=0x7ff2d400c028, cookie=0x7ff30400f330, this=0x7ff304011cc0, op_ret=<optimized out>, op_errno=117, inode=0x7ff2fc00c578, stbuf=0x7ff2d400aba0,
    xattr=0x7ff2fc02fd48, postparent=0x7ff2d400ac10) at dht-common.c:1133
#7  0x00007ff30ba61315 in afr_discover_done (frame=0x7ff2d4068fe8, this=<optimized out>) at afr-common.c:3027
#8  0x00007ff30ba6c175 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7ff2d4068fe8, this=this@entry=0x7ff30400f330) at afr-common.c:2769
#9  0x00007ff30ba6d089 in afr_discover_cbk (frame=frame@entry=0x7ff2d4068fe8, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7ff2fc00c578,
    buf=buf@entry=0x7ff30c6e3900, xdata=0x7ff2fc0166e8, postparent=postparent@entry=0x7ff30c6e3970) at afr-common.c:3077
#10 0x00007ff30bcacf3d in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7ff2d40180a8) at client-rpc-fops.c:2872
#11 0x00007ff313a90ac0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7ff304049ac0, pollin=pollin@entry=0x7ff2fc00b880) at rpc-clnt.c:778
#12 0x00007ff313a90e2b in rpc_clnt_notify (trans=<optimized out>, mydata=0x7ff304049af0, event=<optimized out>, data=0x7ff2fc00b880) at rpc-clnt.c:971
#13 0x00007ff313a8cba3 in rpc_transport_notify (this=this@entry=0x7ff304049e80, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7ff2fc00b880) at rpc-transport.c:557
#14 0x00007ff30eba55e6 in socket_event_poll_in (this=this@entry=0x7ff304049e80, notify_handled=<optimized out>) at socket.c:2322
#15 0x00007ff30eba7c2a in socket_event_handler (fd=11, idx=2, gen=4, data=0x7ff304049e80, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2482
#16 0x00007ff313d498b0 in event_dispatch_epoll_handler (event=0x7ff30c6e3e70, event_pool=0x555e43424750) at event-epoll.c:643
#17 event_dispatch_epoll_worker (data=0x555e4347fca0) at event-epoll.c:759
#18 0x00007ff312b26ea5 in start_thread (arg=0x7ff30c6e4700) at pthread_create.c:307
#19 0x00007ff3123ec8cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) f 0

0 0x00007ff30b541a1a in shard_common_resolve_shards (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, post_res_handler=0x7ff30b54b2d0 ) at shard.c:1030

1030 local->inode_list[i] = inode_ref (res_inode); (gdb) p local->num_blocks $1 = 0

gluster-ant commented 4 years ago

Time: 20190912T08:46:16 bugzilla-bot at gluster.org commented: REVIEW: https://review.gluster.org/23407 (features/shard: Convert shard block indices to uint64) posted (#1) for review on master by Krutika Dhananjay

gluster-ant commented 4 years ago

Time: 20190918T15:51:57 kdhananj at redhat commented: RCA: The size going negative is when two consecutive truncates on the DIRECT_IO_TEST file (coming from open with O_TRUNC) happened in the following sequence:

  1. Size of the file at the beginning - 512b
  2. First truncate on a given mount truncated the file to size 0. Delta size = final size - initial size = 0 - 512 = -512.
  3. Xattrop is now sent with -512. And file size had been 512. So 512 + (-512) = 0. Final on-disk size at the end of this truncate is 0. But shard translator in the truncate fop callback continues to cache 512 as the file size.
  4. Then a second truncate (again to size 0) is sent, without a lookup or stat preceding it. So the size in cache is believed to be true. Delta size = final size - initial size = 0 - 512 = -512. (here initial size should have been 0 but it is wrongly assumed to be 512). So an xattrop is sent with -512. So 0 - 512 = 0xfffffffffffffe00

And this is what we see in the getfattr output of the file:

[root@rhsqa-grafton8 ~]# getfattr -d -m . -e hex /gluster_bricks/data/data/DIRECT_IO_TEST getfattr: Removing leading '/' from absolute path names

file: gluster_bricks/data/data/DIRECT_IO_TEST

security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xe8235ee2ea374f0a9e54853db2781f93 trusted.gfid2path.69693d0e1876710b=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f5f5f4449524543545f494f5f544553545f5f trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0xfffffffffffffe00000000000000000000000000000000010000000000000000

More evidence in wireshark output. Will paste it here tomorrow for the sake of completeness (my laptop's freezing at some point when i try to load the pcap file which is huge in wireshark)

-Krutika

stale[bot] commented 4 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] commented 3 years ago

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.