gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.71k stars 1.08k forks source link

glusterd doesn't start (core dumped at startup) #3947

Open Nivekiba opened 1 year ago

Nivekiba commented 1 year ago

Description of problem:

I build gluster from sources and when I try to start the daemon, i got a core-dump error.

The exact command to reproduce the issue:

systemctl start glusterd

The full output of the command that failed:

-- The job identifier is 2393.
janv. 06 10:24:06 nivek-Latitude-5410 polkitd(authority=local)[1102]: Registered Authentication Agent for unix-process:647890:171759755 (system bus name :1.11255 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
janv. 06 10:24:08 nivek-Latitude-5410 polkitd(authority=local)[1102]: Operator of unix-session:3 successfully authenticated as unix-user:nivek to gain TEMPORARY authorization for action org.freedesktop.systemd1.manage-units for system-bus-name::1.11256 [systemctl start glusterd] (owned by unix-user:nivek)
janv. 06 10:24:08 nivek-Latitude-5410 systemd[1]: glusterd.service: Start request repeated too quickly.
janv. 06 10:24:08 nivek-Latitude-5410 systemd[1]: glusterd.service: Failed with result 'core-dump'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit glusterd.service has entered the 'failed' state with result 'core-dump'.
janv. 06 10:24:08 nivek-Latitude-5410 systemd[1]: Failed to start GlusterFS, a clustered file-system server.
-- Subject: A start job for unit glusterd.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit glusterd.service has finished with a failure.
-- 
-- The job identifier is 154301 and the job result is failed.
janv. 06 10:24:08 nivek-Latitude-5410 polkitd(authority=local)[1102]: Unregistered Authentication Agent for unix-process:647890:171759755 (system bus name :1.11255, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)

Expected results: Successful or something like that

Mandatory info: - The output of the gluster volume info command:

╰─ sudo gluster volume info                                                                                                                                    ─╯
malloc(): invalid size (unsorted)
[1]    647948 abort      sudo gluster volume info

- The output of the gluster volume status command:

- The output of the gluster volume heal command:

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

[2023-01-06 09:10:35.929791 +0000] I [cli.c:788:main] 0-cli: Started running gluster with version 12dev
[2023-01-06 09:18:13.011080 +0000] I [cli.c:788:main] 0-cli: Started running gluster with version 12dev
[2023-01-06 09:26:22.650635 +0000] I [cli.c:788:main] 0-cli: Started running gluster with version 12dev

**- Is there any crash ? Provide the backtrace and coredump

Additional info:

- The operating system / glusterfs version:

Ubuntu 20.04/ glusterfs from actual devel branch (don't know the exact version number)

Update:

Sometimes, I got this error:

╰─ sudo glusterd                                                                                                                                               ─╯
glusterd: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
[1]    748328 abort      sudo glusterd
michaeltraxler commented 1 year ago

same problem here: glusterfs tag v11.0 $ glusterd Fatal glibc error: malloc.c:2593 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0) Aborted (core dumped)

OpenSuSE Tumbleweed glibc-2.37-1.2.src.rpm

Happens with the distribution delivered rpm and also happens when compiled from the the release from github. $ ./configure --without-tcmalloc --disable-linux-io_uring

michaeltraxler commented 1 year ago

Update: When fixing the missing tcmalloc library with the following command: $ ln -s /usr/lib64/libtcmalloc_minimal.so.4 /usr/lib64/libtcmalloc_minimal.so and then $ ./configure --disable-linux-io_uring the error is gone. Seems, that glusterfs needs libtcmalloc_minimal.so.

mohit84 commented 1 year ago

Update: When fixing the missing tcmalloc library with the following command: $ ln -s /usr/lib64/libtcmalloc_minimal.so.4 /usr/lib64/libtcmalloc_minimal.so and then $ ./configure --disable-linux-io_uring the error is gone. Seems, that glusterfs needs libtcmalloc_minimal.so.

From release-10 onwards by default gluster uses tcmalloc for malloc/calloc, if you don;t want to use you can compile the code after providing an option without-tcmalloc.

michaeltraxler commented 1 year ago

From release-10 onwards by default gluster uses tcmalloc for malloc/calloc, if you don;t want to use you can compile the code after providing an option without-tcmalloc.

The problem is, that if I choose to skip tcmalloc with the optione --without-tcmalloc, I can compile the code, but glusterd will fail at startup with the message:

Fatal glibc error: malloc.c:2593 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0) Aborted (core dumped)

The valgrind output is attached. delme.txt

panlinux commented 1 year ago

I'm seeing the same crash on startup in Ubuntu and Debian:

https://autopkgtest.ubuntu.com/packages/g/glusterfs/mantic/amd64

696s Fatal glibc error: malloc.c:2589 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)
699s 
699s Deleting volume gv0
699s /tmp/autopkgtest.o3AzjF/build.T8m/src/debian/tests/create-volume: line 12:  1949 Aborted                 (core dumped) systemctl restart glusterd
699s Fatal glibc error: malloc.c:2589 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)
699s /tmp/autopkgtest.o3AzjF/build.T8m/src/debian/tests/create-volume: line 12:  2133 Aborted                 (core dumped) systemctl restart glusterd