Cacti / spine

Spine C Based Poller for Cacti
GNU Lesser General Public License v2.1
80 stars 44 forks source link

compile on alpine 3.19 & run error #352

Open uzzme opened 2 weeks ago

uzzme commented 2 weeks ago

SPINE: Using spine config file [spine.conf] Version 1.2.27 starting Segmentation fault

TheWitness commented 2 weeks ago
gdb ./spine
run -V 5 -R -S
bt
uzzme commented 2 weeks ago

hk:/opt/spine# gdb ./spine GNU gdb (GDB) 14.1 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-alpine-linux-musl". Type "show configuration" for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.

For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./spine... (gdb) run -V 5 -R -S Starting program: /opt/spine/spine -V 5 -R -S SPINE: Using spine config file [spine.conf] Total[0.0027] DEBUG: The path_php_server variable is /web/cacti/script_server.php Total[0.0028] DEBUG: The path_cactilog variable is /web/cacti/log/cacti.log Total[0.0028] DEBUG: The version variable is 10.11.6-MariaDB-log Total[0.0028] DEBUG: The log_destination variable is 4 (STDOUT) Total[0.0030] DEBUG: The path_php variable is /usr/bin/php Total[0.0031] DEBUG: The availability_method variable is 2 Total[0.0033] DEBUG: The ping_recovery_count variable is 3 Total[0.0034] DEBUG: The ping_failure_count variable is 3 Total[0.0036] DEBUG: The ping_method variable is 2 Total[0.0038] DEBUG: The ping_retries variable is 3 Total[0.0039] DEBUG: The ping_timeout variable is 1000 Total[0.0041] DEBUG: The snmp_retries variable is 3 Total[0.0042] DEBUG: The log_perror variable is 0 Total[0.0044] DEBUG: The log_pwarn variable is 0 Total[0.0045] DEBUG: The boost_redirect variable is 0 Total[0.0047] DEBUG: The boost_rrd_update_enable variable is 0 Total[0.0049] DEBUG: The log_pstats variable is 0 Total[0.0051] DEBUG: The threads variable is 5 Total[0.0052] DEBUG: The polling interval is 60 seconds Total[0.0054] DEBUG: The number of concurrent processes is 1 Total[0.0055] DEBUG: The script timeout is 30 Total[0.0057] DEBUG: The selective_device_debug variable is Total[0.0058] DEBUG: The spine_log_level variable is 0 Total[0.0059] DEBUG: The number of php script servers to run is 1 Total[0.0061] DEBUG: The number of active data source profiles is 2 Total[0.0062] DEBUG: The number of snmp ports on the system is 1 Total[0.0064] DEBUG: StartDevice='-1', EndDevice='-1', TotalPHPScripts='0' Total[0.0064] DEBUG: The PHP Script Server is Not Required Total[0.0066] DEBUG: The Maximum SNMP OID Get Size is 60 Total[0.0068] DEBUG: Total Connections made 1 Total[0.0069] DEBUG: Creating Local Connection Pool of 5 threads. Total[0.0069] DEBUG: Creating Local Connection 0. Total[0.0070] DEBUG: Total Connections made 2 Total[0.0070] DEBUG: Creating Local Connection 1. Total[0.0072] DEBUG: Total Connections made 3 Total[0.0072] DEBUG: Creating Local Connection 2. Total[0.0074] DEBUG: Total Connections made 4 Total[0.0074] DEBUG: Creating Local Connection 3. Total[0.0075] DEBUG: Total Connections made 5 Total[0.0076] DEBUG: Creating Local Connection 4. Total[0.0077] DEBUG: Total Connections made 6 Total[0.0080] DEBUG: Version 1.2.27 starting Total[0.0080] No Device 0 Poller Items found. Total[0.0081] DEBUG: MySQL is Thread Safe! Total[0.0081] DEBUG: Spine running as 0 UID, 0 EUID Total[0.0081] DEBUG: Spine is running as root. Total[0.0081] DEBUG: Spine has got ICMP Total[0.0081] DEBUG: Initializing Net-SNMP API Total[0.0081] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP. Total[0.0105] DEBUG: Initializing PHP Script Server(s) Total[0.0115] DEBUG: Initial Value of Available Threads is 5 (0 outstanding) [New LWP 4680] Total[0.0121] DEBUG: Device[60] Valid Thread to be Created (140737346829112) Total[0.0121] DEBUG: Device[60] Available Threads is 4 (1 outstanding) [New LWP 4681]

Thread 2 "spine" received signal SIGSEGV, Segmentation fault. [Switching to LWP 4680] 0x000055555555c43d in spine_log (format=format@entry=0x555555574098 "DEBUG: Device[%i] HT[%i] In Poller, About to Start Polling") at util.c:1257 warning: 1257 util.c: No such file or directory (gdb) bt

0 0x000055555555c43d in spine_log (format=format@entry=0x555555574098 "DEBUG: Device[%i] HT[%i] In Poller, About to Start Polling") at util.c:1257

1 0x0000555555568360 in child (arg=) at poller.c:106

2 0x00007ffff7fb822e in start (p=0x7ffff7907b00) at src/thread/pthread_create.c:207

3 0x00007ffff7fba82f in __clone () at src/thread/x86_64/clone.s:22

Backtrace stopped: frame did not save the PC

uzzme commented 2 weeks ago
gdb ./spine
run -V 5 -R -S
bt

Please check the information above. Thank you.

TheWitness commented 2 weeks ago

Did you compile spine yourself or download a package?

Please compile spine on your own and please run spine from the source code directory after you've compiled it the same way.

uzzme commented 2 weeks ago

Did you compile spine yourself or download a package?

Please compile spine on your own and please run spine from the source code directory after you've compiled it the same way.

I compiled it by myself.

./bootstrap && ./configure hk:~/tmp/cacti-spine-1.2.27# make gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT sql.o -MD -MP -MF .deps/sql.Tpo -c -o sql.o sql.c mv -f .deps/sql.Tpo .deps/sql.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT spine.o -MD -MP -MF .deps/spine.Tpo -c -o spine.o spine.c mv -f .deps/spine.Tpo .deps/spine.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT util.o -MD -MP -MF .deps/util.Tpo -c -o util.o util.c mv -f .deps/util.Tpo .deps/util.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT snmp.o -MD -MP -MF .deps/snmp.Tpo -c -o snmp.o snmp.c In file included from snmp.c:36: snmp.c: In function 'snmp_host_init': snmp.c:360:43: warning: format '%d' expects argument of type 'int', but argument 3 has type 'size_t' {aka 'long unsigned int'} [-Wformat=] 360 SPINE_LOG_MEDIUM(("SNMP: Using privacy protocol(len): %s(%d)", snmp_priv_protocol, session.securityPrivKeyLen)); ^~~~~~~~~~~ ~~~~~~
size_t {aka long unsigned int}
spine.h:126:104: note: in definition of macro 'SPINE_LOG_MEDIUM' 126 #define SPINE_LOG_MEDIUM(format_and_args) (void)(set.log_level >= POLLER_VERBOSITY_MEDIUM && spine_log format_and_args) ^~~~~~~ snmp.c:360:83: note: format string is defined here 360 SPINE_LOG_MEDIUM(("SNMP: Using privacy protocol(len): %s(%d)", snmp_priv_protocol, session.securityPrivKeyLen)); ~^
int
%ld

mv -f .deps/snmp.Tpo .deps/snmp.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT locks.o -MD -MP -MF .deps/locks.Tpo -c -o locks.o locks.c mv -f .deps/locks.Tpo .deps/locks.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT poller.o -MD -MP -MF .deps/poller.Tpo -c -o poller.o poller.c mv -f .deps/poller.Tpo .deps/poller.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT nft_popen.o -MD -MP -MF .deps/nft_popen.Tpo -c -o nft_popen.o nft_popen.c mv -f .deps/nft_popen.Tpo .deps/nft_popen.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT php.o -MD -MP -MF .deps/php.Tpo -c -o php.o php.c mv -f .deps/php.Tpo .deps/php.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT ping.o -MD -MP -MF .deps/ping.Tpo -c -o ping.o ping.c mv -f .deps/ping.Tpo .deps/ping.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT keywords.o -MD -MP -MF .deps/keywords.Tpo -c -o keywords.o keywords.c mv -f .deps/keywords.Tpo .deps/keywords.Po gcc -DHAVE_CONFIG_H -I. -I./config -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT error.o -MD -MP -MF .deps/error.Tpo -c -o error.o error.c mv -f .deps/error.Tpo .deps/error.Po /bin/sh ./libtool --tag=CC --mode=link gcc -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -L/usr/lib -L/usr/lib -o spine sql.o spine.o util.o snmp.o locks.o poller.o nft_popen.o php.o ping.o keywords.o error.o -lnetsnmp -lmysqlclient_r -lm -ldl -lmysqlclient -lm -ldl -lcrypto -lz -lpthread -ldl -lm -lpthread -lssl libtool: link: gcc -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -o spine sql.o spine.o util.o snmp.o locks.o poller.o nft_popen.o php.o ping.o keywords.o error.o -L/usr/lib -lnetsnmp -lmysqlclient_r -lmysqlclient -lcrypto -lz -ldl -lm -lpthread -lssl /usr/bin/help2man --output=spine.1 --name='Data Collector for Cacti' --no-info --version-option='--version' ./spine

hk:~/tmp/cacti-spine-1.2.27# ./spine --conf=/opt/spine/spine.conf SPINE: Using spine config file [/opt/spine/spine.conf] Version 1.2.27 starting Segmentation fault

uzzme commented 2 weeks ago

(gdb) run -V 5 -R -S Starting program: /root/tmp/cacti-spine-1.2.27/spine -V 5 -R -S SPINE: Using spine config file [spine.conf] Total[0.0027] DEBUG: The path_php_server variable is /web/cacti/script_server.php Total[0.0027] DEBUG: The path_cactilog variable is /web/cacti/log/cacti.log Total[0.0027] DEBUG: The version variable is 10.11.6-MariaDB-log Total[0.0027] DEBUG: The log_destination variable is 4 (STDOUT) Total[0.0029] DEBUG: The path_php variable is /usr/bin/php Total[0.0030] DEBUG: The availability_method variable is 2 Total[0.0032] DEBUG: The ping_recovery_count variable is 3 Total[0.0034] DEBUG: The ping_failure_count variable is 3 Total[0.0035] DEBUG: The ping_method variable is 2 Total[0.0037] DEBUG: The ping_retries variable is 3 Total[0.0039] DEBUG: The ping_timeout variable is 1000 Total[0.0040] DEBUG: The snmp_retries variable is 3 Total[0.0042] DEBUG: The log_perror variable is 0 Total[0.0043] DEBUG: The log_pwarn variable is 0 Total[0.0045] DEBUG: The boost_redirect variable is 0 Total[0.0046] DEBUG: The boost_rrd_update_enable variable is 0 Total[0.0048] DEBUG: The log_pstats variable is 0 Total[0.0049] DEBUG: The threads variable is 5 Total[0.0051] DEBUG: The polling interval is 60 seconds Total[0.0053] DEBUG: The number of concurrent processes is 1 Total[0.0054] DEBUG: The script timeout is 30 Total[0.0055] DEBUG: The selective_device_debug variable is Total[0.0057] DEBUG: The spine_log_level variable is 0 Total[0.0058] DEBUG: The number of php script servers to run is 1 Total[0.0060] DEBUG: The number of active data source profiles is 2 Total[0.0061] DEBUG: The number of snmp ports on the system is 1 Total[0.0063] DEBUG: StartDevice='-1', EndDevice='-1', TotalPHPScripts='0' Total[0.0063] DEBUG: The PHP Script Server is Not Required Total[0.0065] DEBUG: The Maximum SNMP OID Get Size is 60 Total[0.0067] DEBUG: Total Connections made 1 Total[0.0068] DEBUG: Creating Local Connection Pool of 5 threads. Total[0.0068] DEBUG: Creating Local Connection 0. Total[0.0069] DEBUG: Total Connections made 2 Total[0.0069] DEBUG: Creating Local Connection 1. Total[0.0071] DEBUG: Total Connections made 3 Total[0.0071] DEBUG: Creating Local Connection 2. Total[0.0073] DEBUG: Total Connections made 4 Total[0.0073] DEBUG: Creating Local Connection 3. Total[0.0075] DEBUG: Total Connections made 5 Total[0.0075] DEBUG: Creating Local Connection 4. Total[0.0077] DEBUG: Total Connections made 6 Total[0.0080] DEBUG: Version 1.2.27 starting Total[0.0080] No Device 0 Poller Items found. Total[0.0080] DEBUG: MySQL is Thread Safe! Total[0.0080] DEBUG: Spine running as 0 UID, 0 EUID Total[0.0080] DEBUG: Spine is running as root. Total[0.0080] DEBUG: Spine has got ICMP Total[0.0080] DEBUG: Initializing Net-SNMP API Total[0.0081] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP. Total[0.0104] DEBUG: Initializing PHP Script Server(s) Total[0.0115] DEBUG: Initial Value of Available Threads is 5 (0 outstanding) [New LWP 26450] Total[0.0121] DEBUG: Device[60] Valid Thread to be Created (140737346829112) Total[0.0121] DEBUG: Device[60] Available Threads is 4 (1 outstanding) [New LWP 26451]

Thread 2 "spine" received signal SIGSEGV, Segmentation fault. [Switching to LWP 26450] 0x000055555555c43d in spine_log (format=format@entry=0x555555574098 "DEBUG: Device[%i] HT[%i] In Poller, About to Start Polling") at util.c:1257 1257 va_start(args, format); (gdb) bt

0 0x000055555555c43d in spine_log (format=format@entry=0x555555574098 "DEBUG: Device[%i] HT[%i] In Poller, About to Start Polling") at util.c:1257

1 0x0000555555568360 in child (arg=) at poller.c:106

2 0x00007ffff7fb822e in start (p=0x7ffff7907b00) at src/thread/pthread_create.c:207

3 0x00007ffff7fba82f in __clone () at src/thread/x86_64/clone.s:22

Backtrace stopped: frame did not save the PC

TheWitness commented 2 weeks ago

This really looks like a kernel issue. I guess I should boot a Alpine Linux box. Can you report it upstream?

uzzme commented 2 weeks ago

This really looks like a kernel issue. I guess I should boot a Alpine Linux box. Can you report it upstream?

I've tested it on several kernel versions(from alpine 3.12 kernel 5 to alpine 3.19 kernel 6) and the problem persists.

TheWitness commented 2 weeks ago

The problem is is that section of the code is relatively stable and isolated from other threads so the fact that it's seg faulting at that section of the code tells me something else is going on. And the fact that I can't reproduce it on either CentOS 7.9, Rocky 8 or Rocky 9.4 tells me it's something within either the Kernel or dependent modules.

Can you provide the exact tool chain you used to prepare the dev environment? I could not even build correctly. The configure script was throwing errors trying to verify the gcc version.

Provide all the apk commands you used from the base build.

uzzme commented 2 weeks ago

The problem is is that section of the code is relatively stable and isolated from other threads so the fact that it's seg faulting at that section of the code tells me something else is going on. And the fact that I can't reproduce it on either CentOS 7.9, Rocky 8 or Rocky 9.4 tells me it's something within either the Kernel or dependent modules.

Can you provide the exact tool chain you used to prepare the dev environment? I could not even build correctly. The configure script was throwing errors trying to verify the gcc version.

Provide all the apk commands you used from the base build.

`apk add php81-fpm php81-gmp php81-pecl-mailparse php81-mysqli php81-gd php81-cli php81-cgi php81-opcache \ php81-mbstring php81-intl php81-openssl php81-posix php81-curl php81-json php81-pdo php81-pdo_mysql php81-session \ php81-simplexml php81-sockets php81-xml php81-iconv php81-imap php81-soap php81-pdo_mysql php81-pdo php81-mysqli \ php81-pecl-imagick php81-zip php81-ctype php81-ldap php81-snmp php81-gettext php81-pcntl

apk add --virtual .build-deps linux-headers openssl-dev geoip-dev expat-dev mariadb-dev net-snmp-dev \ zlib-dev bsd-compat-headers lua-dev luajit-dev brotli-dev autoconf cmake make gcc g++ zlib-dev pcre-dev \ git file udns-dev help2man dos2unix automake libtool `

TheWitness commented 2 weeks ago

Perfect.

TheWitness commented 2 weeks ago

Trying on 3.20-extended and not finding the php packages, brotli-dev, or dos2unix.

TheWitness commented 2 weeks ago

dos2unix is installed by default, but I can not fine php*

TheWitness commented 2 weeks ago

Figured it out.

TheWitness commented 2 weeks ago

Please log a bug with the Alpine team. Provide them a copy of this image. Here is the call stack:

pthread_create() calls clone(). clone() calls start(). start() calls child(). child() calls poll_host() poll_host() segfaults.

I don't see anything wrong with the call stack. See if they have any clues. I don't see them. There are several posts about the Busybox implementation of pthread's that lead me to believe that there is likely some "tweaks" that need to be done to Alpine to make it work better with pthreads.

Keep us posted on what they tell you. We can customize our configure script to do such things, but you will have to do the research as I don't have the cycles. I hope that is clear.

image

TheWitness commented 2 weeks ago

This does not look good. Many many reports of segfault with several versions of Alpine Linux. Search for this on Google. DuckDuckGo, not so good on these types of searches:

"alpine linux" segmentation fault at the start of pthread_create

uzzme commented 2 weeks ago

It seems the alpine doesn't support it. I can use cmd.php as well.

This does not look good. Many many reports of segfault with several versions of Alpine Linux. Search for this on Google. DuckDuckGo, not so good on these types of searches:

"alpine linux" segmentation fault at the start of pthread_create

TheWitness commented 2 weeks ago

BusyBox is such a thin overlay version to the kernel, these things are bound to happen. I read many issues about stack size issues, which for this ticket actually makes sense, but none of the solutions suggested actually worked.

Sorry about that. I did find an issue though on string formatting, which can lead to segfaults, but I doubt is will help, in fact I know it won't as I fixed it and same result.

uzzme commented 2 weeks ago

BusyBox is such a thin overlay version to the kernel, these things are bound to happen. I read many issues about stack size issues, which for this ticket actually makes sense, but none of the solutions suggested actually worked.

Sorry about that. I did find an issue though on string formatting, which can lead to segfaults, but I doubt is will help, in fact I know it won't as I fixed it and same result.

You're right, busybox is just a kernel simplified version, Anyway, thank you!