Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2k stars 575 forks source link

Sysconfig limits and settings are not respected #6215

Closed akqopensystems closed 6 years ago

akqopensystems commented 6 years ago

Expected Behavior

On RHEL/CentOS7, process limits as seen by systemctl show and cat /proc/PID/limits should provide consistent information.

Current Behavior

On RHEL7:

[root@osshplpmo06 ~]# systemctl cat icinga2 
# /usr/lib/systemd/system/icinga2.service
[...]
# /etc/systemd/system/icinga2.service.d/limits.conf
[Service]
LimitNOFILE=50000
[...]
[root@osshplpmo06 ~]# systemctl show -p LimitNOFILE icinga2
LimitNOFILE=50000
[root@osshplpmo06 ~]# cat /proc/55402/limits |grep "Max open"
Max open files            16384                16384                files

This is irritating, as the system administrator can't easily determine the limits in force for Icinga2. We've already had a discussion with RH support about this and they think that this may be related to the "--no-stack-rlimit" option passed on the command line.

Possible Solution

If Icinga2 sets its own limits (to 16384), this should be explicitly documented. Better, to make this setting configurable by the user.

Steps to Reproduce (for bugs)

  1. Create limits file for service in /etc/systemd/system//cinga2.service.d/limits.conf:
    [Service]
    LimitNOFILE=50000
  2. systemctl daemon-reload && systemctl restart icinga2.service
  3. systemctl show -p LimitNOFILE icinga2.service
  4. cat /proc/Icinga2-PID/limits

Context

At the moment we are looking into an issue with checks that sometimes forward no performance metrics for Graphite. In order to rule out resource exhaustion, we are checking the configured limits of the Icinga2 processes.

Your Environment

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

Application information: Installation root: /usr Sysconf directory: /etc Run directory: /run Local state directory: /var Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid

System information: Platform: Red Hat Enterprise Linux Server Platform version: 7.4 (Maipo) Kernel: Linux Kernel version: 3.10.0-693.17.1.el7.x86_64 Architecture: x86_64

Build information: Compiler: GNU 4.8.5 Build host: unknown

Object 'ossnplpmo03.xxxx.de' of type 'Endpoint': % declared in '/etc/icinga2/zones.conf', lines 11:1-11:41

Object 'osszplpmo02.xxxx.de' of type 'Endpoint': % declared in '/etc/icinga2/zones.conf', lines 6:1-6:41

dnsmichi commented 6 years ago

You can set these limits in the sysconfig file. See the "advanced" table in this chapter: https://www.icinga.com/docs/icinga2/latest/doc/17-language-reference/#constants

Crunsher commented 6 years ago

Our way of doing this may not be standard, for RHEL specific changes to init scripts and the sort please see https://github.com/Icinga/rpm-icinga2

akqopensystems commented 6 years ago

Thanks for the clarification! I think these options would be better documented in the configuration chapter, maybe in a topic "Advanced configuration": https://www.icinga.com/docs/icinga2/latest/doc/04-configuring-icinga-2/ Unfortunately, this seems not to work as expected. On a test system with only slight differences to production:

[root@ossztlvmo12 icinga2]# cat /etc/sysconfig/icinga2 
DAEMON=/usr/sbin/icinga2
ICINGA2_CONFIG_FILE=/etc/icinga2/icinga2.conf
ICINGA2_RUN_DIR=/run
ICINGA2_STATE_DIR=/var
ICINGA2_PID_FILE=$ICINGA2_RUN_DIR/icinga2/icinga2.pid
ICINGA2_LOG_DIR=/var/log/icinga2
ICINGA2_ERROR_LOG=$ICINGA2_LOG_DIR/error.log
ICINGA2_STARTUP_LOG=$ICINGA2_LOG_DIR/startup.log
ICINGA2_LOG=$ICINGA2_LOG_DIR/icinga2.log
ICINGA2_CACHE_DIR=$ICINGA2_STATE_DIR/cache/icinga2
ICINGA2_USER=icinga
ICINGA2_GROUP=icinga
ICINGA2_COMMAND_GROUP=icingacmd
ICINGA2_RLIMIT_FILES=50000
ICINGA2_RLIMIT_PROCESSES=62883
[root@ossztlvmo12 icinga2]# systemctl cat icinga2
# /usr/lib/systemd/system/icinga2.service
[Unit]
Description=Icinga host/service/network monitoring system
After=syslog.target network-online.target postgresql.service mariadb.service carbon-cache.service carbon-relay.service

[Service]
Type=forking
EnvironmentFile=/etc/sysconfig/icinga2
ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/sysconfig/icinga2
ExecStart=/usr/sbin/icinga2 daemon -d -e ${ICINGA2_ERROR_LOG}
PIDFile=/run/icinga2/icinga2.pid
ExecReload=/usr/lib/icinga2/safe-reload /etc/sysconfig/icinga2
TimeoutStartSec=30m

# Systemd >228 enforces a lower process number for services.
# Depending on the distribution and Systemd version, this must
# be explicitly raised. Packages will set the needed values
# into /etc/systemd/system/icinga2.service.d/limits.conf
#
# Please check the troubleshooting documentation for further details.
# The values below can be used as examples for customized service files.

#TasksMax=infinity
#LimitNPROC=62883

[Install]
WantedBy=multi-user.target
[root@ossztlvmo12 icinga2]# systemctl show icinga2
[...]
LimitCPU=18446744073709551615
LimitFSIZE=18446744073709551615
LimitDATA=18446744073709551615
LimitSTACK=18446744073709551615
LimitCORE=18446744073709551615
LimitRSS=18446744073709551615
LimitNOFILE=4096
[...]
root@ossztlvmo12 icinga2]# systemctl stop icinga2
[root@ossztlvmo12 icinga2]# systemctl start icinga2
[root@ossztlvmo12 icinga2]# ps -ef |grep icinga2|grep -v plugin
icinga   14980     1  0 12:51 ?        00:00:00 /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon -d -e $ICINGA2_LOG_DIR/error.log
icinga   14985     1 66 12:51 ?        00:00:04 /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon -d -e $ICINGA2_LOG_DIR/error.log
[root@ossztlvmo12 icinga2]# cat /proc/14980/limits |grep "Max open"
Max open files            16384                16384                files     
[root@ossztlvmo12 icinga2]# cat /proc/14985/limits |grep "Max open"
Max open files            16384                16384                files

We are using the Icinga2 rpm repository.

dnsmichi commented 6 years ago

You cannot go lower than the default of 16k open files. That is a sane default what Icinga 2 requires at minimum to run and work.

Read-write. Defines the resource limit for RLIMIT_NOFILE that should be set at start-up. Value cannot be set lower than the default 16 * 1024. 0 disables the setting. Set in Icinga 2 sysconfig.

Mikesch-mp commented 6 years ago

He is refering to ICINGA2_RLIMIT_FILES=50000 set in /etc/sysconfig/icinga2. It should raise the max open files to 50k but it is still at 16k, so icinga2 is ignoring it. I have the same problem on SLES, icinga2 ignores the settings. Even in older versions (2.7.2) it does not work to set RLimitFiles in init.conf.

akqopensystems commented 6 years ago

Thanks, @Mikesch-mp . Yes, the problem is that we want to increase the maximum number of open files to 50000, but the icinga2 processes ignore this change and stay at the default of 16 * 1024.

dnsmichi commented 6 years ago

Ah ok, thanks, shouldn't comment here when I am tired after giving a training. Then I am out of ideas and one needs to reproduce the problem.

akqopensystems commented 6 years ago

At the moment, our checks are getting more and more late due to the fixed ICINGA2_RLIMIT_FILES. On all checking systems, we are running into the file limit from time to time with service checks getting late as much as 10 minutes (at 5 minutes schedule). grafik The check above was eventually executed with 9 minutes delay. Also, the icinga2 graphite writer module is not able to send the performance metrics to Graphite in this situation:

cat /var/log/icinga2/icinga2.log
[2018-04-12 16:54:50 +0200] critical/GraphiteWriter: Cannot write to TCP socket on host '127.0.0.1' port '2013'.
[2018-04-12 16:55:00 +0200] critical/GraphiteWriter: Cannot write to TCP socket on host '127.0.0.1' port '2013'.
[2018-04-12 16:55:09 +0200] critical/GraphiteWriter: Cannot write to TCP socket on host '127.0.0.1' port '2013'.
[2018-04-12 16:55:20 +0200] critical/GraphiteWriter: Cannot write to TCP socket on host '127.0.0.1' port '2013'.

This leads to large gaps in the Graphite performance graphs: grafik

Here's a sample number of open files from an icinga2 satellite at the time the checks are late:

[root@xxxxxmo03 ~]# lsof |grep -c icinga
17771
dnsmichi commented 6 years ago

Confirmed, it is a bug. Tested inside the Icinga Vagrant box standalone.

[root@icinga2 ~]# grep -ri files /etc/sysconfig/icinga2
ICINGA2_RLIMIT_FILES=50000

[root@icinga2 ~]# systemctl restart icinga2

[root@icinga2 ~]# for p in $(pidof icinga2); do cat /proc/$p/limits | grep "Max open"; done
Max open files            16384                16384                files
Max open files            16384                16384                files

[root@icinga2 ~]# icinga2 console --connect 'https://root:icinga@localhost:5665/' --eval 'RLimitFiles'
16384.0
pogii123 commented 6 years ago

It seems that all changes that are done in /etc/sysconfig/icinga2 dont take affect. Even if I put random characters in there.. Nothing works.

[root@icinga2 ]# grep -i user /etc/sysconfig/icinga2
ICINGA2_USER=ici12345nga
[root@icinga2 ]# systemctl restart icinga2
[root@icinga2 ]# icinga2 variable get RunAsUser
icinga
[root@icinga2 ]# ps -ef | grep icinga2
icinga   13628     1  0 17:47 ?        00:00:00 /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon -d -e $ICINGA2_LOG_DIR/error.log
icinga   13633     1  0 17:47 ?        00:00:00 /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon -d -e $ICINGA2_LOG_DIR/error.log
Crunsher commented 6 years ago

We narrowed down the bug to the file not being read at all, i.e. the path in some instances is not set.

dnsmichi commented 6 years ago

The path is compiled into the binary, and under specific circumstances an empty string. We've observed this with builds inside Docker and other variants. The patch requires a new tagged release including proper tests.

dnsmichi commented 6 years ago

The referenced package fixes are not part of this ticket, only topic related for 2.8.3.

dnsmichi commented 6 years ago

@Crunsher I've cherry-picked 11853cb36339920729bcaa5fcd461b5a288ba4cb for better logging into the coming PR for this issue. feature/rlimit-errno can be deleted.

dnsmichi commented 6 years ago
mbmif /usr/local/icinga2 (master *) # icinga2 daemon
[2018-04-19 10:07:12 +0200] warning/icinga-app: Sysconfig file '/usr/local/icinga2/etc/sysconfig/icinga2' cannot be read. Using default values.
[2018-04-19 10:07:12 +0200] information/cli: Icinga application loader (version: v2.8.2-637-g081988a0d; debug)
dnsmichi commented 6 years ago
[root@icinga2-elastic ~]# vim /etc/sysconfig/icinga2
[root@icinga2-elastic ~]# systemctl restart icinga2
[root@icinga2-elastic ~]# for p in $(pidof icinga2); do cat /proc/$p/limits | grep "Max open"; done
Max open files            50000                50000                files
Max open files            50000                50000                files
[root@icinga2-elastic ~]# icinga2 console --connect 'https://root:icinga@localhost:5665/' --eval 'RLimitFiles'
50000.0
sebastic commented 6 years ago

/etc/sysconfig is not applicable for the Debian family (which uses /etc/default for init script variables), the warnings cause users to file bugs like: Debian Bug #898703.

Ideally the sysconfig directory is not checked for the Debian family, or /etc/default is checked instead.

dnsmichi commented 6 years ago

We're dealing with this in #6255 scheduled for CW 21.