gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.73k stars 1.08k forks source link

Failed to start GlusterFS, a clustered file-system server. (on ODROID-HC1 with Ubutu 22.04) #3911

Open k-van-man opened 1 year ago

k-van-man commented 1 year ago

Description of problem: On a fresh install Ubuntu 22.04 on Odroid HC1 (ARM® big.LITTLE™ ): Failed to start GlusterFS, a clustered file-system server.

The exact command to reproduce the issue: root@hc1:~# systemctl enable --now glusterd

The full output of the command that failed:

root@hc1:~# systemctl status glusterd.service
× glusterd.service - GlusterFS, a clustered file-system server
     Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2022-11-25 11:20:41 UTC; 7s ago
       Docs: man:glusterd(8)
    Process: 838 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=1/FAILURE)
        CPU: 60ms

Nov 25 11:20:41 hc1 glusterd[839]: llistxattr 1
Nov 25 11:20:41 hc1 glusterd[839]: setfsid 1
Nov 25 11:20:41 hc1 glusterd[839]: epoll.h 1
Nov 25 11:20:41 hc1 glusterd[839]: xattr.h 1
Nov 25 11:20:41 hc1 glusterd[839]: st_atim.tv_nsec 1
Nov 25 11:20:41 hc1 glusterd[839]: package-string: glusterfs 10.1
Nov 25 11:20:41 hc1 glusterd[839]: ---------
Nov 25 11:20:41 hc1 systemd[1]: glusterd.service: Control process exited, code=exited, status=1/FAILURE
Nov 25 11:20:41 hc1 systemd[1]: glusterd.service: Failed with result 'exit-code'.
Nov 25 11:20:41 hc1 systemd[1]: Failed to start GlusterFS, a clustered file-system server.
root@hc1:~# journalctl -xeu glusterd.service
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit glusterd.service has begun execution.
░░
░░ The job identifier is 742.
Nov 25 11:20:41 hc1 glusterd[839]: pending frames:
Nov 25 11:20:41 hc1 glusterd[839]: patchset: git://git.gluster.org/glusterfs.git
Nov 25 11:20:41 hc1 glusterd[839]: signal received: 4
Nov 25 11:20:41 hc1 glusterd[839]: time of crash:
Nov 25 11:20:41 hc1 glusterd[839]: 2022-11-25 11:20:41 +0000
Nov 25 11:20:41 hc1 glusterd[839]: configuration details:
Nov 25 11:20:41 hc1 glusterd[839]: argp 1
Nov 25 11:20:41 hc1 glusterd[839]: backtrace 1
Nov 25 11:20:41 hc1 glusterd[839]: dlfcn 1
Nov 25 11:20:41 hc1 glusterd[839]: libpthread 1
Nov 25 11:20:41 hc1 glusterd[839]: llistxattr 1
Nov 25 11:20:41 hc1 glusterd[839]: setfsid 1
Nov 25 11:20:41 hc1 glusterd[839]: epoll.h 1
Nov 25 11:20:41 hc1 glusterd[839]: xattr.h 1
Nov 25 11:20:41 hc1 glusterd[839]: st_atim.tv_nsec 1
Nov 25 11:20:41 hc1 glusterd[839]: package-string: glusterfs 10.1
Nov 25 11:20:41 hc1 glusterd[839]: ---------
Nov 25 11:20:41 hc1 systemd[1]: glusterd.service: Control process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ An ExecStart= process belonging to unit glusterd.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Nov 25 11:20:41 hc1 systemd[1]: glusterd.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ The unit glusterd.service has entered the 'failed' state with result 'exit-code'.
Nov 25 11:20:41 hc1 systemd[1]: Failed to start GlusterFS, a clustered file-system server.
░░ Subject: A start job for unit glusterd.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit glusterd.service has finished with a failure.
░░
░░ The job identifier is 742 and the job result is failed.

Expected results: Running gluster deamon

Mandatory info: - The output of the gluster volume info command: Connection failed. Please check if gluster daemon is operational. - The output of the gluster volume status command: Connection failed. Please check if gluster daemon is operational. - The output of the gluster volume heal command: Connection failed. Please check if gluster daemon is operational. **- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

[2022-11-25 11:20:41.454970 +0000] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, {version=10.1}, {cmdlinestr=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO}]
[2022-11-25 11:20:41.456669 +0000] I [glusterfsd.c:2447:daemonize] 0-glusterfs: Pid of current running process is 839
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 4
time of crash:
2022-11-25 11:20:41 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.1

**- Is there any crash ? Provide the backtrace and coredump

root@hc1:~# /usr/sbin/glusterd --debug
[2022-11-25 11:29:21.312302 +0000] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, {version=10.1}, {cmdlinestr=/usr/sbin/glusterd --debug}]
[2022-11-25 11:29:21.312417 +0000] I [glusterfsd.c:2447:daemonize] 0-glusterfs: Pid of current running process is 882
[2022-11-25 11:29:21.312448 +0000] D [logging.c:1705:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2022-11-25 11:29:21.318402 +0000] D [MSGID: 0] [gf-io.c:513:gf_io_run] 0-io: Trying I/O engine 'legacy'
[2022-11-25 11:29:21.318444 +0000] D [MSGID: 0] [gf-io.c:517:gf_io_run] 0-io: I/O engine 'legacy' is ready
[2022-11-25 11:29:21.318658 +0000] D [logging.c:1675:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 3 extra log messages
[2022-11-25 11:29:21.318690 +0000] D [logging.c:1681:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 3 extra log messages
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 4
time of crash:
2022-11-25 11:29:21 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.1
---------
Illegal instruction (core dumped)

Additional info:

- The operating system / glusterfs version:

root@hc1:~# /usr/sbin/glusterd --version
glusterfs 10.1
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
root@hc1:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
root@hc1:~# uname -a
Linux hc1 5.4.224-408 #1 SMP PREEMPT Thu Nov 24 22:46:40 UTC 2022 armv7l armv7l armv7l GNU/Linux

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

k-van-man commented 1 year ago

core.882.zip

xhernandez commented 1 year ago

Gluster crashed because of an "illegal instruction". If I'm not wrong, Odroid HC1 is a 32-bit platform, and Gluster doesn't support it.

Most probably the reason of the crash is an attempt to execute a 64-bit atomic instruction, which is not supported in a 32-bit platform.

k-van-man commented 1 year ago

Aha, yes the Odroid HC1 is old and 32 bit, and running Ubuntu 22.04 Is there another solution to host a distributed file system on multiple ARM 32-bit processor system like the Odroid HC1?

k-van-man commented 1 year ago

I'm not the only one with old 32-bits cpu boards see this link

als-git commented 1 year ago

TLDR: glusterfs doesn't support 32bit anymore.

xhernandez: The illegal instruction has nothing do to with your architecture. It is an intentional "can't fix this, crash now" triggered by userspace-rcu being asked to do the impossible (in this case, atomically update a 64bit value on a 32bit arch).

The version of userspace-rcu used by glusterfs 10.3 has this snippet of code:

unsigned long __uatomic_add_return(void *addr, unsigned long val,
                                 int len)
{
        switch (len) {
        case 1:
        {
                unsigned char result = val;

                __asm__ __volatile__(
                "lock; xaddb %1, %0"
                        : "+m"(*__hp(addr)), "+q" (result)
                        :
                        : "memory");
                return result + (unsigned char)val;
        }
        case 2:
        {
                unsigned short result = val;

                __asm__ __volatile__(
                "lock; xaddw %1, %0"
                        : "+m"(*__hp(addr)), "+r" (result)
                        :
                        : "memory");
                return result + (unsigned short)val;
        }
        case 4:
        {
                unsigned int result = val;

                __asm__ __volatile__(
                "lock; xaddl %1, %0"
                        : "+m"(*__hp(addr)), "+r" (result)
                        :
                        : "memory");
                return result + (unsigned int)val;
        }
#if (CAA_BITS_PER_LONG == 64)
        case 8:
        {
                unsigned long result = val;

                __asm__ __volatile__(
                "lock; xaddq %1, %0"
                        : "+m"(*__hp(addr)), "+r" (result)
                        :
                        : "memory");
                return result + (unsigned long)val;
        }
#endif
        }
        /* 
         * generate an illegal instruction. Cannot catch this with
         * linker tricks when optimizations are disabled.
         */
        __asm__ __volatile__("ud2");
        return 0;
}

On a 32bit arch, the len=8 branch will be excluded by the preprocessor (because the condition is false) and so for 64bit values, it falls through straight to the explicit invocation of "ud2" (undefined), which cause the SIGILL. This is an intentional get-out-of-here crash, because there is nothing useful the code do here except exit in a noisy fashion (to attract attention to the problem).

Older version of glusterfs might still work (I had glusterfs 8.2 running on NetBSD/i386).

xhernandez commented 1 year ago

TLDR: glusterfs doesn't support 32bit anymore.

xhernandez: The illegal instruction has nothing do to with your architecture. It is an intentional "can't fix this, crash now" triggered by userspace-rcu being asked to do the impossible (in this case, atomically update a 64bit value on a 32bit arch).

That's exactly why the problem is caused by the architecture of the server. userspace-rcu just replaces an impossible to encode operation (an atomic 64-bit instruction) with an "ud2" instruction, but the reason is the lack of support for 64-bit atomic operations.

Though there are not many efforts to make sure Gluster works on 32-bit architectures, its possible that I'll fix this problem soon (though there's no guarantee that everything else will work).

davemuench commented 1 year ago

Subscribing as I have a fleet of 10 HC2's I'd love to keep up to date. I absolutely agree that 32 bit in this day and age is a bit silly, but at the same time I have yet to find a similar 64 bit alternative.

evindunn commented 1 year ago

Here's how I got gluster 6 installed on ubuntu 22.04 armhf. Warning: this is a hack & your mileage may vary.

add-apt-repository ppa:gluster/glusterfs-6

Emits warning cause there's no gluster 6 release for jammy. Then update /etc/apt/sources.list.d/gluster-ubuntu-glusterfs-6-jammy.list to look like

deb https://ppa.launchpadcontent.net/gluster/glusterfs-6/ubuntu/ focal main
# deb-src https://ppa.launchpadcontent.net/gluster/glusterfs-6/ubuntu/ focal main

# For old dependencies
deb http://ports.ubuntu.com/ubuntu-ports focal main
deb http://ports.ubuntu.com/ubuntu-ports focal-security main

/etc/apt/preferences.d/99-gluster

Package: *gluster* *libgf*
Pin: release o=LP-PPA-gluster-glusterfs-6
Pin-Priority: 999

apt clean all && apt-get update && apt install glusterfs-server

In case this is of interest to the devs, I get a python warning during apt install -y glusterfs-server

Setting up libgfapi0:armhf (6.10-ubuntu1~focal1) ...
Setting up glusterfs-common (6.10-ubuntu1~focal1) ...
Adding group `gluster' (GID 122) ...
Done.
/usr/lib/arm-linux-gnueabihf/glusterfs/python/syncdaemon/syncdutils.py:705: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if dirpath is "/":
Setting up glusterfs-client (6.10-ubuntu1~focal1) ...
Setting up glusterfs-server (6.10-ubuntu1~focal1) ...

But systemctl enable --now glusterd seems to be running fine