canonical / microceph

Ceph for a one-rack cluster and appliances
https://snapcraft.io/microceph
GNU Affero General Public License v3.0
193 stars 25 forks source link

EL7 :: thread_create failed: Operation not permitted #337

Open adriansev opened 3 months ago

adriansev commented 3 months ago

Issue report

What version of MicroCeph are you using ?

18.2.0+snap21acc74fff

What are the steps to reproduce this issue ?

  1. snap install microceph
  2. snap refresh --hold microceph
  3. microceph cluster bootstrap

What happens (observed behaviour) ?

runtime/cgo: pthread_create failed: Operation not permitted
SIGABRT: abort
PC=0x7fb7f4c319fc m=0 sigcode=18446744073709551610

goroutine 0 gp=0x105df00 m=0 mp=0x105eb00 [idle]:
runtime: g 0 gp=0x105df00: unknown pc 0x7fb7f4c319fc
stack: frame={sp:0x7ffdc8e5e530, fp:0x0} stack=[0x7ffdc8660000,0x7ffdc8e5eac0)

.... snip ...

goroutine 1 gp=0xc0000061c0 m=0 mp=0x105eb00 [running]:
runtime.systemstack_switch()
        runtime/asm_amd64.s:474 +0x8 fp=0xc000184750 sp=0xc000184740 pc=0x476508
runtime.main()
        runtime/proc.go:171 +0x67 fp=0xc0001847e0 sp=0xc000184750 pc=0x445a67
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc0001847e8 sp=0xc0001847e0 pc=0x478521

... snip ...

What were you expecting to happen ?

to work

Additional comments.

This is a Centos 7 with 6.8.2-1.el7.elrepo.x86_64, selinux disabled

UtkarshBhatthere commented 2 months ago

I will spin up a centOS VM and try to reproduce this. Also, FWIW, microceph cmds are issued with sudo (which is not present in the cmds you have mentioned in the description.)

adriansev commented 2 months ago

Thanks a lot! The commands were run as root so it was pointless to use sudo

lmlg commented 2 months ago

Getting an EPERM error from pthread_create leads me to think that this is an issue with the running environment and Go's runtime - essentially, this can only happen when creating a thread with real-time priorities when that isn't allowed.

Looking around, it appears that some CentOS defaults can cause this. Some possible solutions:

echo "-1" > /proc/sys/kernel/sched_rt_runtime_us
cd /sys/fs/cgroup/cpu
echo $$ > tasks

Running the program after setting ulimit -r unlimited (or changing the appropriate setting in /etc/security/limits.conf).

adriansev commented 2 months ago

thanks for looking at this! i tried the above but i still get the same errors: microceph_errors.txt

adriansev commented 2 months ago

as for ulimits the following is the default:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 256326
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 256326
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited