gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.64k stars 1.08k forks source link

Crash on startup of glusterd 11 (heap-buffer-overflow) #4192

Open jengelh opened 1 year ago

jengelh commented 1 year ago

Description of problem:

glusterd from glusterfs-11 just dies on startup. With ASAN enabled I get

==4418==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x613000000190 at pc 0x7f028cd2341c bp 0x7ffd9c5ba7e0 sp 0x7ffd9c5ba7d8
WRITE of size 8 at 0x613000000190 thread T0
    #0 0x7f028cd2341b in mem_get_pool_list ~/libglusterfs/src/mem-pool.c:786
    #1 0x7f028cd235aa in mem_get_from_pool ~/libglusterfs/src/mem-pool.c:820
    #2 0x7f028cd23a98 in mem_get_malloc ~/libglusterfs/src/mem-pool.c:883
    #3 0x7f028cd23985 in mem_get_calloc ~/libglusterfs/src/mem-pool.c:866
    #4 0x7f028cca2577 in _gf_msg_internal ~/libglusterfs/src/logging.c:1843
    #5 0x7f028cca2ed9 in _gf_msg ~/libglusterfs/src/logging.c:1954
    #6 0x7f028cca5fea in _gf_smsg ~/libglusterfs/src/logging.c:2416
    #7 0x413344 in main ~/glusterfsd/src/glusterfsd.c:2872
    #8 0x7f028c943baf in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #9 0x7f028c943c78 in __libc_start_main_impl ../csu/libc-start.c:360
    #10 0x405e34 in _start ../sysdeps/x86_64/start.S:115

0x613000000190 is located 0 bytes after 336-byte region [0x613000000040,0x613000000190)
allocated by thread T0 here:
    #0 0x7f028d0dc04f in malloc (/usr/lib64/libasan.so.8+0xdc04f) (BuildId: 44194dcf14c212b57346030492309d59d5379ae1)
    #1 0x7f028cd211db in __gf_default_malloc glusterfs/mem-pool.h:112
    #2 0x7f028cd23332 in mem_get_pool_list ~/libglusterfs/src/mem-pool.c:778
    #3 0x7f028cd235aa in mem_get_from_pool ~/libglusterfs/src/mem-pool.c:820
    #4 0x7f028cd23a98 in mem_get_malloc ~/libglusterfs/src/mem-pool.c:883
    #5 0x7f028cd23985 in mem_get_calloc ~/libglusterfs/src/mem-pool.c:866
    #6 0x7f028cca2577 in _gf_msg_internal ~/libglusterfs/src/logging.c:1843
    #7 0x7f028cca2ed9 in _gf_msg ~/libglusterfs/src/logging.c:1954
    #8 0x7f028cca5fea in _gf_smsg ~/libglusterfs/src/logging.c:2416
    #9 0x413344 in main ~/glusterfsd/src/glusterfsd.c:2872
    #10 0x7f028c943baf in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: heap-buffer-overflow ~/libglusterfs/src/mem-pool.c:786 in mem_get_pool_list
Shadow bytes around the buggy address:
  0x612fffffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x612fffffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x613000000000: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x613000000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x613000000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x613000000180: 00 00[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x613000000200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x613000000280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x613000000300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x613000000380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x613000000400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4418==ABORTING

(Without ASAN, glibc malloc throws an assert at some stage in the glusterd initialization due to to corruption.)

The exact command to reproduce the issue:

# ln -s glusterfsd/src/.libs/glusterfsd glusterd
# LD_LIBRARY_PATH=$PWD/api/src/.libs:$PWD/libglusterfs/src/.libs:$PWD/rpc/rpc-lib/src/.libs:$PWD/rpc/xdr/src/.libs:$PWD/xlators/features/changelog/lib/src/.libs ./glusterd --no-daemon

- The operating system / glusterfs version:

openSUSE Tumbleweed 20230701 gcc 13.1.1

jengelh commented 1 year ago
777         if (!pool_list) {
778             pool_list = MALLOC(pool_list_size);
779             if (!pool_list) {
780                 return NULL;
781             }
782
783             INIT_LIST_HEAD(&pool_list->thr_list);
784             (void)pthread_spin_init(&pool_list->lock, PTHREAD_PROCESS_PRIVATE);
785             for (i = 0; i < NPOOLS; ++i) {
786                 pool_list->pools[i].parent = &pools[i];
787                 pool_list->pools[i].hot_list = NULL;
788                 pool_list->pools[i].cold_list = NULL;
789             }
790         }

NPOOLS is 14. pool_list_size is reported to be 336, which is 14 sizeof(pool_list), which is not enough for a struct with hanging tail.

(gdb) ptyp *pool_list
type = struct per_thread_pool_list {
    struct list_head thr_list;
    pthread_spinlock_t lock;
    _Bool poison;
    per_thread_pool_t pools[];
}

Blech:

    pool_list_size = sizeof(per_thread_pool_list_t) +
                     sizeof(per_thread_pool_t) * (NPOOLS - 1);
mykaul commented 1 year ago

Does it happen without memory pools?

jengelh commented 1 year ago

(mempool=no tcmalloc=yes) startup runs fine. In openSUSE, glusterfs was built without tcmalloc. One dependency less…

mykaul commented 1 year ago

(mempool=no tcmalloc=yes) startup runs fine. In openSUSE, glusterfs was built without tcmalloc. One dependency less…

Tests have shown to improve performance nicely with tcmalloc (and without memory pools).

jengelh commented 1 year ago

Well then remove --without-tcmalloc?

mohit84 commented 1 year ago

The patch (https://github.com/gluster/glusterfs/pull/4196) is reverted from release-11 to avoid an issue.

panlinux commented 1 year ago

Just the revert isn't enough to fix the crash in the debian package of 11.0. With the patch from https://github.com/gluster/glusterfs/pull/4193, then it no longer crashes, and the DEP8 tests pass.

panlinux commented 1 year ago

Just the revert isn't enough to fix the crash in the debian package of 11.0. With the patch from #4193, then it no longer crashes, and the DEP8 tests pass.

Let me update that statement: the revert does seem enough in the sense that the crash no longer happens.