centreon / centreon-engine

Extremely fast monitoring scheduler, forked from Nagios
GNU General Public License v2.0
42 stars 17 forks source link

Re: Bug#948491: centengine crashes regulary #338

Open swoopla opened 4 years ago

swoopla commented 4 years ago

I create a [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948491](Debian Bug):

Package: centreon-engine Version: 18.10.0-4 Debian-version: Buster Severity: grave

I have an seg-fault in centengine process

sudo systemctl status centengine.service ● centengine.service - Centreon Engine Loaded: loaded (/lib/systemd/system/centengine.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Thu 2020-01-09 10:28:37 CET; 9min ago Process: 26832 ExecStart=/usr/sbin/centengine /etc/centreon-engine/centengine.cfg (code=killed, signal=SEGV) Main PID: 26832 (code=killed, signal=SEGV)

janv. 09 10:09:22 centreon-01 centreon-engine[26832]: [1578560962] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.45: rta 0,749ms, lost 0%01 janv. 09 10:11:52 centreon-01 centreon-engine[26832]: [1578561112] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.43: rta 0,850ms, lost 0% janv. 09 10:14:27 centreon-01 centreon-engine[26832]: [1578561267] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.45: rta 0,922ms, lost 0% janv. 09 10:16:57 centreon-01 centreon-engine[26832]: [1578561417] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.43: rta 0,754ms, lost 0% janv. 09 10:19:32 centreon-01 centreon-engine[26832]: [1578561572] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.45: rta 0,663ms, lost 0% janv. 09 10:22:02 centreon-01 centreon-engine[26832]: [1578561722] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.43: rta 0,741ms, lost 0% janv. 09 10:24:37 centreon-01 centreon-engine[26832]: [1578561877] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.45: rta 0,567ms, lost 0% janv. 09 10:27:07 centreon-01 centreon-engine[26832]: [1578562027] [26832] HOST ALERT: srv.local.lan;UP;HARD;1;OK - 192.168.1.43: rta 0,819ms, lost 0% janv. 09 10:28:37 centreon-01 systemd[1]: centengine.service: Main process exited, code=killed, status=11/SEGV janv. 09 10:28:37 centreon-01 systemd[1]: centengine.service: Failed with result 'signal'.

sudo dmesg (...) [ 78.330643] random: 7 urandom warning(s) missed due to ratelimiting [ 5321.507438] centengine[26832]: segfault at 0 ip 000055c4d2403868 sp 00007ffd605fa390 error 4 in centengine[55c4d22e0000+134000] [ 5321.507456] Code: 89 c2 4c 89 ee 4c 89 e7 e8 35 d0 ed ff ba 01 00 00 00 48 8d 35 03 94 01 00 4c 89 e7 e8 21 d0 ed ff 48 8b 5b 18 48 85 db 74 58 <48> 83 3b 00 74 f1 ba 01 00 00 00 48 8d 35 07 9a 01 00 48 89 ef e8

And centengine crashes regulary :[75240.343141] centengine[26832]: segfault at 0 ip 00005581d940b868 sp 00007ffc5c825490 error 4 in centengine[5581d92e8000+134000][75240.343165] Code: 89 c2 4c 89 ee 4c 89 e7 e8 35 d0 ed ff ba 01 00 00 00 48 8d 35 03 94 01 00 4c 89 e7 e8 21 d0 ed ff 48 8b 5b 18 48 85 db 74 58 <48> 83 3b 00 74 f1 ba 01 00 00 00 48 8d 35 07 9a 01 00 48 89 ef e8[78230.179697] centengine[29957]: segfault at 0 ip 000056273cc7b868 sp 00007ffc746b24b0 error 4 in centengine[56273cb58000+134000][78230.179718] Code: 89 c2 4c 89 ee 4c 89 e7 e8 35 d0 ed ff ba 01 00 00 00 48 8d 35 03 94 01 00 4c 89 e7 e8 21 d0 ed ff 48 8b 5b 18 48 85 db 74 58 <48> 83 3b 00 74 f1 ba 01 00 00 00 48 8d 35 07 9a 01 00 48 89 ef e8[81829.339926] centengine[31868]: segfault at 0 ip 000056389f77b868 sp 00007ffef4389f80 error 4 in centengine[56389f658000+134000][81829.339944] Code: 89 c2 4c 89 ee 4c 89 e7 e8 35 d0 ed ff ba 01 00 00 00 48 8d 35 03 94 01 00 4c 89 e7 e8 21 d0 ed ff 48 8b 5b 18 48 85 db 74 58 <48> 83 3b 00 74 f1 ba 01 00 00 00 48 8d 35 07 9a 01 00 48 89 ef e8[85290.785865] centengine[2182]: segfault at 0 ip 000055e3ab7cf868 sp 00007fff3c33f800 error 4 in centengine[55e3ab6ac000+134000][85290.785882] Code: 89 c2 4c 89 ee 4c 89 e7 e8 35 d0 ed ff ba 01 00 00 00 48 8d 35 03 94 01 00 4c 89 e7 e8 21 d0 ed ff 48 8b 5b 18 48 85 db 74 58 <48> 83 3b 00 74 f1 ba 01 00 00 00 48 8d 35 07 9a 01 00 48 89 ef e8[87850.778995] centengine[6300]: segfault at 0 ip 0000555b4c86d868 sp 00007fffbdbfe0f0 error 4 in centengine[555b4c74a000+134000][87850.779023] Code: 89 c2 4c 89 ee 4c 89 e7 e8 35 d0 ed ff ba 01 00 00 00 48 8d 35 03 94 01 00 4c 89 e7 e8 21 d0 ed ff 48 8b 5b 18 48 85 db 74 58 <48> 83 3b 00 74 f1 ba 01 00 00 00 48 8d 35 07 9a 01 00 48 89 ef e8

swoopla commented 4 years ago

In Debian Bug ticket , Bernhard Übelacker tried to get some more info:

https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=948491;msg=20

swoopla commented 4 years ago

janv. 10 08:54:55 x-centreon-loc-01 systemd-coredump[21519]: Process 18792 (centengine) of user 108 dumped core.

Stack trace of thread 18792:

0 0x000055e5b739a868 _ZN3com8centreon6engine9retention4dump15customvariablesERSoRK28customvariablesmember_struct (centengine)

1 0x000055e5b739d29b _ZN3com8centreon6engine9retention4dump7serviceERSoRK14service_struct (centengine)

2 0x000055e5b739d653 _ZN3com8centreon6engine9retention4dump8servicesERSo (centengine)

3 0x000055e5b736426b _exec_event_retention_save (centengine)

4 0x000055e5b7363b5d handle_timed_event (centengine)

5 0x000055e5b735eecb _ZN3com8centreon6engine6events4loop12_dispatchingEv (centengine)

6 0x000055e5b735f9e9 _ZN3com8centreon6engine6events4loop3runEv (centengine)

7 0x000055e5b7287774 main (centengine)

8 0x00007f561602709b __libc_start_main (libc.so.6)

9 0x000055e5b72882aa _start (centengine)

Stack trace of thread 18793:

0 0x00007f56160f1819 __poll (libc.so.6)

1 0x00007f5616500778 _ZN3com8centreon15process_manager4_runEv (libcentreon_clib.so)

2 0x00007f5616506fff _ZN3com8centreon11concurrency6thread8_executeEPv (libcentreon_clib.so)

3 0x00007f561652afa3 start_thread (libpthread.so.0)

4 0x00007f56160fc4cf __clone (libc.so.6)

Stack trace of thread 18795:

0 0x00007f56160c9720 __nanosleep (libc.so.6)

1 0x00007f56160f4874 usleep (libc.so.6)

2 0x00007f56157742d5 _ZN3com8centreon6broker10processing8failover3runEv (cbmod.so)

3 0x00007f5614f7d726 n/a (libQtCore.so.4)

4 0x00007f561652afa3 start_thread (libpthread.so.0)

5 0x00007f56160fc4cf __clone (libc.so.6)

Stack trace of thread 18794:

0 0x00007f56160f1819 __poll (libc.so.6)

1 0x00007f5612b8c4cb poll (externalcmd.so)

2 0x00007f561652afa3 start_thread (libpthread.so.0)

3 0x00007f56160fc4cf __clone (libc.so.6)

swoopla commented 4 years ago

In Debian Bug ticket i run debug program and the result is in centengine-debug.txt file.

swoopla commented 4 years ago

"Bernhard Übelacker bernhardu@mailbox.org" created a patch for this bug in Debian Bug ticket