Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Satellite has problems with expired CRL #9487

Open xschlef opened 2 years ago

xschlef commented 2 years ago

Describe the bug

A very simple configuration of an icinga2 satellite was unable to connect to our icinga2-master, because the CRL has expired. The daemon was not reloaded for 30 days, which is our maximum CRL age. This is basically the same issue we faced with #8501.

The main purpose of this satellite is to check master reachability and health. We update our CRL every 6 hours and a restart/reload fixes the issue. So I think, that a running daemon is not correctly reloading the changed CRL. We only face this issue with this icinga2 instance! Other hosts are reloading correctly, but are running a more complex configuration.

Aug 06 08:39:39 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:39:49 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:39:59 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:39:59 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:39:59 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:40:09 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:40:19 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:40:19 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:40:19 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:40:29 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired

To Reproduce

Start icinga2, wait until CRL expiration and connections start to fail if the master drops the connection, because of config reloads.

# zones.conf
object Endpoint "icinga2-master" {
        host = "icinga2-master"
        port = "5665"
}
object Zone "master" {
        endpoints = [ "icinga2-master" ]
}
object Endpoint "icinga2-satellite" {

}

object Zone "icinga2-satellite" {
        endpoints = [ "icinga2-satellite" ]
        parent = "master"
}

Expected behavior

The daemon periodically reloads the CRL or monitors the CRL for changes.

Your Environment

icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.4-1)

System information: Platform: Debian GNU/Linux Platform version: 10 (buster) Kernel: Linux Kernel version: 4.19.0-21-amd64 Architecture: x86_64

Enabled features: api checker command mainlog notification syslog

Config validation:

[2022-08-15 14:48:59 +0200] information/cli: Icinga application loader (version: r2.13.4-1)
[2022-08-15 14:48:59 +0200] information/cli: Loading configuration file(s).
[2022-08-15 14:48:59 +0200] information/ConfigItem: Committing config item(s).
[2022-08-15 14:48:59 +0200] information/ApiListener: My API identity: icinga2-satellite
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 2 Notifications.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 Host.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 EventCommand.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 SyslogLogger.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 2 Zones.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 159 CheckCommands.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 9 UserGroups.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 7 TimePeriods.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 28 Users.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 Service.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 7 NotificationCommands.
[2022-08-15 14:48:59 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2022-08-15 14:48:59 +0200] information/cli: Finished validating the configuration file(s).

Additional context

We are rolling out our own certificate infrastructure and are not relying on icinga2 pki.

Al2Klimov commented 2 years ago

Hello @xschlef!

icinga2 satellite was unable to connect to our icinga2-master

Have you tried to configure both connection directions?

master -> sat master <- sat

Best, A/K

Al2Klimov commented 2 years ago

refs #8515

xschlef commented 2 years ago

Hi,

the problem is the connection sat -> master. Our master does not initiate any connections. To be honest, I have no idea why only this instance is having problems with CRL expiry. All other agents are working fine and are reloading their CRL.

Thanks!

Al2Klimov commented 2 years ago

Do all other agents get not reloaded for 30 days?

xschlef commented 2 years ago

exactly. We have agents running for 90+ days without any issues, not reloaded only loosing the connection to the master because of config reloads. This is the only server that is facing this issue.

But you made me check twice. We are just setting the CRL for this sat and the master. All other agents do not use the crl (we really should fix that). My guess is, that the issues will show up for all our agents if I set the crl correctly...

The features-enabled/api.conf for sat and master is the following:

object ApiListener "api" {
  bind_host = "::"
  accept_commands = true
  accept_config = true
  crl_path = "/etc/ssl/crl/server-ca-2017.0.crl.pem"
}
Al2Klimov commented 2 years ago

Our master does not initiate any connections.

Just because it's configured so or due to network design?

xschlef commented 2 years ago

Our master does not initiate any connections.

Just because it's configured so or due to network design?

network design. it makes firewall rules a lot easier...

julianbrost commented 2 years ago

If you replace the CRL file with a newer version, there is code to update it:

https://github.com/Icinga/icinga2/blob/7d64fbf8f6245d811df765674b2c9d876ca62597/lib/remote/apilistener.cpp#L485-L493

If that doesn't work, that's a bug for sure. However, keep in mind that Icinga doesn't attempt to download a CRL file if there's a URL specified in the certificate.

xschlef commented 2 years ago

The crl is automatically downloaded to the local fs every 6 hours via a fetch-crl cron on every server including the sat and master.

Current state: -rw-r--r-- 1 root root 18114 Aug 17 06:44 /etc/ssl/crl/server-ca-2017.0.crl.pem

julianbrost commented 2 years ago

So your satellite never receives any incoming connections? The issue probably is that the CRL is only updated once per accept(), therefore, just requesting https://localhost:5665/ periodically should work as a workaround.

But yes, the CRL should also be updated when performing outgoing connections as well.