Closed dbu1986 closed 5 years ago
I'm down with cutting off the plugin output, but cutting off the perfdata will lead to problems with other tools trying to read it. Maybe an all or nothing solution is better in this case: If the perfdata is too long, we log that as a Warning and write nothing into the DB.
Sounds good. I would consider applying this to both columns, long_output & perfdata. I've had ~936553 characters to be saved into long_output, and ~14629 characters to be saved into perfdata. In this case the non-fitting problem applies to long_output.
I would fix the plugin in the first place. 900K characters is nothing you would like to render in Icinga Web 2 nor read in a notification email in case of emergency. I suppose that's a custom written plugin, which can be modified. If you need inspiration for good formatted plugin outputs, here's a talk from a past Icinga Camp: https://www.youtube.com/watch?v=Ey_APqSCoFQ
Thanks for linking this valuable video! Sure, already did so. My point is to harden Icinga2 against such an impact. It took a while for me to identify the dependency between a crappy plugin output and seemingly system-wide overdue checks in Icingaweb2.
This is something taken into account with the new IcingaDB backend. For the IDO schema, there won't be any efforts to fix this unfortunately.
Expected Behavior
The Output of a check plugin should not affect Icinga2's database connection health, which results in all services to look overdue in Icingaweb2. Icinga2 should take care of submitting database-compatible sql queries regarding the max length of table columns.
Current Behavior
As soon as Icinga2 is beginning to handle a very large plugin output in one service, the list of overdue check results in Icingaweb2 gets longer, until all active services seem to be overdue in Icingaweb2. The debuglog shows database exceptions e.g. as follows:
It looks like the executed UPDATE query into icinga_servicestatus (with the very long output) causes Icinga2 to time out and close it's database connection.
Possible Solution
Icinga2 makes sure that the plugin output will fit into the column 'long_output' in table icinga_servicestatus (column type = TEXT => 65535 Chars), e.g. by cutting off the plugin output string while building the sql query. Same might be relevant to column 'perfdata' as well.
Steps to Reproduce (for bugs)
Context
I am deploying services executing a new check plugin (custom made perl script). Due to different reasons the plugin returns 945635 characters of output, including lots of host-based errors. I know, it might not be Icinga2's job to take care of buggy outputs of custom plugins, but it seems to allow pretty extensive impacts, which should IMHO get limited.
As Icinga2 begins to execute checks, I notice that all services are getting high last check times. The list of overdue check results is increasing until all ~8800 services are affected. Hitting the service's "Check now" link does not change this behaviour. The debuglog shows exceptions regarding the database connection (see above). I think that's the reason why Icingaweb2 cannot retreive the latest check results from the database. The checks are definitely executed freshly and in time, as traced via debuglog.
I assume that the large sql query is responsible for the situation, so limiting the input values to the maximum column size of the destination table might be an appropriate way to avoid it.
Your Environment
icinga2 --version
):Copyright (c) 2012-2018 Icinga Development Team (https://www.icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
Application information: Installation root: /usr Sysconf directory: /etc Run directory: /run Local state directory: /var Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid
System information: Platform: Red Hat Enterprise Linux Server Platform version: 7.5 (Maipo) Kernel: Linux Kernel version: 3.10.0-862.9.1.el7.x86_64 Architecture: x86_64
Build information: Compiler: GNU 4.8.5 Build host: unknown
Enabled features: api checker command debuglog graphite ido-mysql mainlog notification
[2018-08-16 14:38:08 +0200] information/cli: Icinga application loader (version: r2.9.1-1) [2018-08-16 14:38:08 +0200] information/cli: Loading configuration file(s). [2018-08-16 14:38:08 +0200] information/ConfigItem: Committing config item(s). [2018-08-16 14:38:09 +0200] information/ApiListener: My API identity: fravm007010.os-fra.local [2018-08-16 14:38:18 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 3.25/s (195/min 195/5min 195/15min); [2018-08-16 14:38:19 +0200] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 0, rate: 0/s (0/min 0/5min 0/15min); [2018-08-16 14:38:19 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min); [2018-08-16 14:38:19 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min); [2018-08-16 14:38:19 +0200] information/WorkQueue: #8 (GraphiteWriter, graphite) items: 0, rate: 0/s (0/min 0/5min 0/15min); [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 8809 Services. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 37 ServiceGroups. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 70 HostGroups. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 EventCommand. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 2 FileLoggers. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 NotificationComponent. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 9 NotificationCommands. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 9421 Notifications. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 IcingaApplication. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 612 Hosts. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 ApiListener. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 407 Downtimes. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 GraphiteWriter. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 140 Comments. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 CheckerComponent. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 499 Zones. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 ExternalCommandListener. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 501 Endpoints. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 7 ApiUsers. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 UserGroup. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 IdoMysqlConnection. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 298 CheckCommands. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 19 TimePeriods. [2018-08-16 14:38:25 +0200] information/ConfigItem: Instantiated 1 User. [2018-08-16 14:38:26 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2018-08-16 14:38:26 +0200] information/cli: Finished validating the configuration file(s).