Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
1.99k stars 573 forks source link

[dev.icinga.com #10688] Endpoint not created #3688

Closed icinga-migration closed 8 years ago

icinga-migration commented 8 years ago

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10688

Created by bert2002 on 2015-11-20 10:00:36 +00:00

Assignee: (none) Status: Rejected (closed on 2016-03-18 16:11:17 +00:00) Target Version: (none) Last Update: 2016-03-18 16:11:17 +00:00 (in Redmine)

Icinga Version: 2.3.10
Backport?: Not yet backported
Include in Changelog: 1

Hi,

we have Icinga 2.3.10 running and get clients use the api to connect and push results. We just created a new environment from a staging environment and no client in the new environment gets any endpoint/zone/hosts on the icinga server.

The only difference between the two environments are the names:

stage: stage-app-stats-1.srv.company.com prod: app-stats-1.srv.company.com

The staging clients have no problem and everyone is happy. The production clients cant connect and get refused from the icinga server with this error (debuglog):

[2015-11-20 09:54:14 +0100] notice/ApiClient: Received 'icinga::Hello' message from 'app-stats-1.srv.company.com'
[2015-11-20 09:54:14 +0100] notice/ApiClient: Received 'event::SetNextCheck' message from 'app-stats-1.srv.company.com'
[2015-11-20 09:54:14 +0100] notice/ApiEvents: Discarding 'next check changed' message from 'app-stats-1.srv.company.com': Invalid endpoint origin (client not allowed).

Apparently this is a problem with the certificates, but according to the troubleshooting guide they are okay:

root@app-stats-1:~# openssl verify -verbose -CAfile /etc/icinga2/pki/ca.crt /etc/icinga2/pki/app-stats-1.srv.company.com.crt
/etc/icinga2/pki/app-stats-1.srv.company.com.crt: OK
root@app-stats-1:~#

root@app-stats-1:~# md5sum /etc/icinga2/pki/ca.crt 
082c8ecd93e7c44e3825e2ebe08eba06 /etc/icinga2/pki/ca.crt
root@app-stats-1:~#

root@icinga ~# md5sum /etc/icinga2/pki/ca.crt 
082c8ecd93e7c44e3825e2ebe08eba06 /etc/icinga2/pki/ca.crt

When I try to create a new endpoint with the name of the production environment i get told that the endpoint already exist:

root@icinga repository.d# icinga2 repository endpoint add name=app-stats-1.srv.company.com 
information/Utility: Loading library 'libicinga.so'
warning/cli: Endpoint 'app-stats-1.srv.company.com' already exists. Skipping creation.

But I cant find it in any way on the system (status file, etc.).

root@icinga repository.d# find . | grep stats
./endpoints/stage-app-stats-1.srv.company.com.conf
./zones/stage-app-stats-1.srv.company.com.conf
./hosts/stage-app-stats-1.srv.company.com.conf
./hosts/stage-app-stats-1.srv.company.com
./hosts/stage-app-stats-1.srv.company.com/ping6.conf
./hosts/stage-app-stats-1.srv.company.com/icinga.conf
./hosts/stage-app-stats-1.srv.company.com/disk.conf
./hosts/stage-app-stats-1.srv.company.com/apt.conf
./hosts/stage-app-stats-1.srv.company.com/load.conf
./hosts/stage-app-stats-1.srv.company.com/procs.conf
./hosts/stage-app-stats-1.srv.company.com/swap.conf
./hosts/stage-app-stats-1.srv.company.com/ping4.conf
./hosts/stage-app-stats-1.srv.company.com/ssh.conf
./hosts/stage-app-stats-1.srv.company.com/disk %2F.conf
root@icinga repository.d#

I dont understand the problem, or what I am missing (we have an automated deployment for the client and its working for other name schemas/new server without any problem). Any idea where/what I could check? May be dump some data from Icinga, database, etc.?

Any help appreciated.

bert2002

icinga-migration commented 8 years ago

Updated by jflach on 2015-11-20 10:33:11 +00:00

icinga2 object list --type Endpoint

Lists all your endpoints and the files they come from.

icinga-migration commented 8 years ago

Updated by bert2002 on 2015-11-20 11:27:44 +00:00

Hi,

the endpoint for the hosts is not created.

[root@icinga repository.d]# icinga2 object list --type Endpoint |grep app-stats-1.srv.company.com  |grep -v stage
[root@icinga repository.d]#

Running update-config, etc. does not help either. :(

icinga-migration commented 8 years ago

Updated by jflach on 2015-11-20 11:43:53 +00:00

Disregard my earlier version of the comment, please

Run icinga2 object list --type Endpoint --name app-stats-1.srv.company.com

Don't grep, you need the whole output

icinga-migration commented 8 years ago

Updated by bert2002 on 2015-11-20 12:45:50 +00:00

Hi,

same issue. no output, only for the staging servers.

icinga-migration commented 8 years ago

Updated by gbeutner on 2015-11-21 03:37:40 +00:00

FWIW I'm unable to reproduce this on a new installation.

icinga-migration commented 8 years ago

Updated by mfriedrich on 2015-11-25 14:16:34 +00:00

Please re-test this issue with the latest 2.4.0 release.

icinga-migration commented 8 years ago

Updated by bert2002 on 2015-12-01 14:26:00 +00:00

dnsmichi wrote:

Please re-test this issue with the latest 2.4.0 release.

okay I updated to 2.4.0 (centos) but problem still exist.

icinga-migration commented 8 years ago

Updated by bert2002 on 2015-12-08 08:51:28 +00:00

When a, not working, client connects. I get following error:

[2015-12-08 09:41:20 +0100] warning/JsonRpcConnection: Error while reading JSON-RPC message for identity 'app-stats-1.srv.company.com': Error: std::exception

        (0) libbase.so: void boost::throw_exception(icinga::openssl_error const&) (+0x97) [0x3fbf130a57]
        (1) libbase.so: void boost::exception_detail::throw_exception_(icinga::openssl_error const&, char const*, char const*, int) (+0x40) [0x3fbf130b00]
        (2) libbase.so: icinga::TlsStream::HandleError() const (+0xbc) [0x3fbf0cf46c]
        (3) libbase.so: icinga::TlsStream::Read(void*, unsigned long, bool) (+0x7e) [0x3fbf0cf5de]
        (4) libbase.so: icinga::StreamReadContext::FillFromStream(boost::intrusive_ptr const&, bool) (+0x55) [0x3fbf0de6e5]
        (5) libbase.so: icinga::NetString::ReadStringFromStream(boost::intrusive_ptr const&, icinga::String*, icinga::StreamReadContext&, bool) (+0xce) [0x3fbf0ecdde]
        (6) libremote.so: icinga::JsonRpc::ReadMessage(boost::intrusive_ptr const&, boost::intrusive_ptr*, icinga::StreamReadContext&, bool) (+0x3d) [0x3fbd4c24ad]
        (7) libremote.so: icinga::JsonRpcConnection::ProcessMessage() (+0x65) [0x3fbd4e6555]
        (8) libremote.so: icinga::JsonRpcConnection::DataAvailableHandler() (+0x38) [0x3fbd502e68]
        (9) libbase.so: boost::signals2::detail::signal_impl const&), boost::signals2::optional_last_value, int, std::less, boost::function const&)>, boost::function const&)>, boost::signals2::mutex>::operator()(boost::intrusive_ptr const&) (+0x1bb) [0x3fbf1630ab]
        (10) libbase.so: icinga::Stream::SignalDataAvailable() (+0x27) [0x3fbf113a17]
        (11) libbase.so: icinga::TlsStream::OnEvent(int) (+0x3af) [0x3fbf113e9f]
        (12) libbase.so: icinga::SocketEvents::ThreadProc() (+0x23a) [0x3fbf110aca]
        (13) /usr/lib64/libboost_thread.so.1.53.0() [0x3fbcc0c5c3]
        (14) /lib64/libpthread.so.0() [0x30fa407a51]
        (15) libc.so.6: clone (+0x6d) [0x30fa0e893d]

I double checked the certificates and they are identical. Installed openssl versions:

icinga server:

# rpm -qa |grep openssl
openssl-1.0.1e-42.el6.x86_64
openssl-devel-1.0.1e-42.el6.x86_64

client:

# dpkg -l openssl
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                            Version              Architecture         Description
+++-===============================-====================-====================-====================================================================
ii  openssl                         1.0.1f-1ubuntu2.15   amd64                Secure Sockets Layer toolkit - cryptographic utility

Other clients (with a different name schema) are working with the same openssl version.

icinga-migration commented 8 years ago

Updated by bert2002 on 2015-12-08 10:17:00 +00:00

Okay something "funny" happened. We configured some custom checks on the client and they are getting created on the icinga server. Of course its not working, because the zone for the host does not really exist. I would have expected that the custom checks get ignored as all other checks.

critical/config: Error: Validation failed for object 'app-stats-1.srv.company.com!http_cdh_namenode_2' of type 'Service'; Attribute 'zone': Object 'app-stats-1.srv.company.com' of type 'Zone' does not exist.
Location: in /etc/icinga2/repository.d/hosts/app-stats-1.srv.company.com/http_cdh_namenode_2.conf: 5:2-5:38
/etc/icinga2/repository.d/hosts/app-stats-1.srv.company.com/http_cdh_namenode_2.conf(3):  check_command = "dummy"
/etc/icinga2/repository.d/hosts/app-stats-1.srv.company.com/http_cdh_namenode_2.conf(4):  host_name = "app-stats-1.srv.company.com"
/etc/icinga2/repository.d/hosts/app-stats-1.srv.company.com/http_cdh_namenode_2.conf(5):  zone = "app-stats-1.srv.company.com"
                                                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/repository.d/hosts/app-stats-1.srv.company.com/http_cdh_namenode_2.conf(6): }
/etc/icinga2/repository.d/hosts/app-stats-1.srv.company.com/http_cdh_namenode_2.conf(7): 
icinga-migration commented 8 years ago

Updated by mfriedrich on 2016-02-24 21:55:28 +00:00

We configured some custom checks on the client and they are getting created on the icinga server.

I can't follow here. How are these config objects created on the master? Is the client node's zone configured with the master zone being its parent?

icinga-migration commented 8 years ago

Updated by mfriedrich on 2016-03-18 16:11:17 +00:00

Since we cannot reproduce the issue here and it is not obvious which cli command causes the error, I'll close the issue.