Closed mwaldmueller closed 3 years ago
"debug"-output:
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=info msg="Icinga 2 sent no heartbeat for 15 seconds. Pausing sync" UUID=92352a36-985f-4166-b266-d3b83dd5ab9d context=HA environment=da39a3ee5e6b4b0d3255bfef95601890afd80709
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="servicegroup_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="checkcommand: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="checkcommand_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="checkcommand_argument: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="checkcommand_envvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="eventcommand: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="eventcommand_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="eventcommand_argument: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="eventcommand_envvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notificationcommand: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notificationcommand_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notificationcommand_argument: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notificationcommand_envvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="hostgroup: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="hostgroup_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="zone: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="timeperiod: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="timeperiod_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="timeperiod_override_include: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="timeperiod_override_exclude: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="timeperiod_range: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="comment: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="action_url: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="customvar_flat: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notes_url: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="icon_image: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="usergroup: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="usergroup_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notification: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notification_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notification_user: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notification_usergroup: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="notification_recipient: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="endpoint: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="host_state: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="host: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="host_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="hostgroup_member: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="user_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="user: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="usergroup_member: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="downtime: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="service_customvar: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="servicegroup_member: Lost responsibility"
Apr 23 11:27:01 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:01+02:00" level=debug msg="servicegroup: Lost responsibility"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Insert preparation worker started"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Update preparation worker started"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Delete execution worker started"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Lost responsibility"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Insert execution worker started"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Insert preparation worker stopped"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Update comparison worker started"
Apr 23 11:27:42 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:42+02:00" level=debug msg="service_state: Update execution worker started"
Apr 23 11:27:49 idb3.localdomain icingadb[17817]: time="2020-04-23T11:27:49+02:00" level=fatal msg="read tcp 127.0.0.1:51912->127.0.0.1:6380: i/o timeout"
Apr 23 11:27:49 idb3.localdomain systemd[1]: icingadb.service: Main process exited, code=exited, status=1/FAILURE
Apr 23 11:27:49 idb3.localdomain systemd[1]: icingadb.service: Failed with result 'exit-code'.```
Apropos system resources: How much %CPU does our Redis consume?
Thanks, can you please provide packages that also include #168?
I tested with those packages:
ii icingadb 1.0.0+rc1.21.ge7ef0ce.20200430.1045+buster-0 amd64 database connector for Icinga 2
ii icingadb-redis 1.0.0+rc1.21.ge7ef0ce.20200430.1045+buster-0 amd64 Redis service for Icinga DB
Resources from my prior tests:
# ps -C icinga2,mysqld,icingadb,icinga-redis-se -o %cpu,%mem,cmd
%CPU %MEM CMD
40.3 8.2 /usr/bin/icinga-redis-server *:6380
15.6 8.7 /usr/sbin/mysqld
6.0 9.4 /usr/sbin/icingadb --config /etc/icingadb/icingadb.ini
0.0 0.1 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e /var/log/icinga2/error.log
96.4 42.8 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e /var/log/icinga2/error.log
60.2 13.1 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e /var/log/icinga2/error.log
I still got the error:
Apr 30 15:31:04 idb3.localdomain icingadb[12940]: time="2020-04-30T15:31:04+02:00" level=fatal msg="read tcp 127.0.0.1:48348->127.0.0.1:6380: i/o timeout"
Apr 30 15:31:04 idb3.localdomain systemd[1]: icingadb.service: Main process exited, code=exited, status=1/FAILURE
Apr 30 15:31:04 idb3.localdomain systemd[1]: icingadb.service: Failed with result 'exit-code'
Versions: icinga2: v2.12.0-rc1 icingadb: v1.0.0+rc1.21.ge7ef0ce.20200430.1045+buster-0 icingadb-redis: v1.0.0+rc1.21.ge7ef0ce.20200430.1045+buster-0 icingaweb2: v2.8.0-rc1 icingadb-web: v1.0.0-rc1
@N-o-X Why does IcingaDB's Redis error handling differ from the MySQL one?
@Al2Klimov I think some things were added to the MySQL part after the first implementation and I just didn't think of also adding those things to Redis. Would be nice to have that 👍
Just discovered that our Redis lib provides configurable auto-retrying which covers net I/O errors by itself. Shall we use that instead?
If that's easier to use, then to build such a thing from ground up, sure why not 👍
Closing this issue, because our RC2 changes completely reworked our Redis client implementation.
I played around with huge environments (e.g. 20.000 Hosts/200.000 Services) and IcingaDB fails with:
I don't think it's related to system resources. A restart of icingadb.service solves the problem.
Versions: icinga2: v2.12.0-rc1 icingadb: v1.0.0-rc1 icingaweb2: v2.8.0-rc1 icingadb-web: v1.0.0-rc1