Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Invalid Perfdata causing Segmentation fault with ElasticsearchWriter #6191

Closed bengelberth-shadowsoft closed 6 years ago

bengelberth-shadowsoft commented 6 years ago

Performance data that contains an invalid unit of measure is causing a segmentation fault with the Elasticsearch feature when enable_send_perfdata is true.

Expected Behavior

I would expect Icinga2 to issue an error to the icinga2 log file and continue to operate with out a Segmentation fault when incorrect units are found in perfdata.

Current Behavior

When a check returns incorrectly formatted performance data it can cause a Segmentation fault with the Elasticsearch feature on and enable_send_perfdata is true

Possible Solution

Steps to Reproduce (for bugs)

This can be reproduced with the following configuration. I also saw this with checks that received an error back and stuck a string in place of a number for performance data. It does not require a valid elasticsearch connection. Testing was performed before standing up an elasticsearch ingest process.

  1. Check Command with bad performance data:
#!/bin/bash

echo "Output | crash=0cs"
exit 0
  1. CheckCommand and Service Object definitions:
object CheckCommand "crash" {
  command = [ PluginDir + "/check_crash" ]

}

apply Service "crash" {
  check_command = "crash"
  assign where host.name
}

3.elasticsearch.conf

library "perfdata"

object ElasticsearchWriter "elasticsearch" {
  //host = "127.0.0.1"
  //port = 9200
  //index = "icinga2"
  enable_send_perfdata = true
  //flush_threshold = 1024
  //flush_interval = 10s
}

4.

Context

No all third party check commands always produce valid performance data. When they return incorrectly formatted data Icinga2 crashes.

Your Environment

icinga2 - The Icinga 2 network monitoring daemon (version: r2.8.2-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-693.21.1.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

This can be reproduced on a single master.

dnsmichi commented 6 years ago

Confirmed, even without a running ES.

dnsmichi commented 6 years ago
mbmif /usr/local/icinga2/etc/icinga2/tests (master *) # cat check_es_crash
#!/bin/bash

echo "Output | crash=0cs"
exit 0

mbmif /usr/local/icinga2/etc/icinga2/tests (master *) # cat es.conf
object CheckCommand "es-crash" {
  command = [ SysconfDir + "/icinga2/tests/check_es_crash" ]
}

object Host "es-crash" {
  check_command = "es-crash"
  check_interval = 1s
}
[2018-04-03 14:33:09 +0200] warning/ElasticsearchWriter: Ignoring invalid perfdata value: 'crash=0cs' for object 'es-crash'.
Context:
    (0) Elasticwriter processing check result for 'es-crash'

Assertion failed: (px != 0), function operator->, file /usr/local/include/boost/smart_ptr/intrusive_ptr.hpp, line 199.
Process 42418 stopped
* thread #12, stop reason = signal SIGABRT
    frame #0: 0x00007fff5a6eae3e libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff5a6eae3e <+10>: jae    0x7fff5a6eae48            ; <+20>
    0x7fff5a6eae40 <+12>: movq   %rax, %rdi
    0x7fff5a6eae43 <+15>: jmp    0x7fff5a6e20b8            ; cerror_nocancel
    0x7fff5a6eae48 <+20>: retq
Target 0: (icinga2) stopped.
(lldb) up
frame #1: 0x00007fff5a829150 libsystem_pthread.dylib`pthread_kill + 333
libsystem_pthread.dylib`pthread_kill:
    0x7fff5a829150 <+333>: movl   %eax, %r15d
    0x7fff5a829153 <+336>: cmpl   $-0x1, %r15d
    0x7fff5a829157 <+340>: jne    0x7fff5a829161            ; <+350>
    0x7fff5a829159 <+342>: callq  0x7fff5a82c19c            ; symbol stub for: __error
(lldb)
frame #2: 0x00007fff5a647312 libsystem_c.dylib`abort + 127
libsystem_c.dylib`abort:
    0x7fff5a647312 <+127>: movl   $0x2710, %edi             ; imm = 0x2710
    0x7fff5a647317 <+132>: callq  0x7fff5a6198e4            ; usleep$NOCANCEL
    0x7fff5a64731c <+137>: callq  0x7fff5a647321            ; __abort

libsystem_c.dylib`__abort:
    0x7fff5a647321 <+0>:   cmpq   $0x0, 0x38f78e1f(%rip)    ; gCRAnnotations + 7
(lldb)
frame #3: 0x00007fff5a60f368 libsystem_c.dylib`__assert_rtn + 320
libsystem_c.dylib`basename_r:
    0x7fff5a60f368 <+0>: pushq  %rbp
    0x7fff5a60f369 <+1>: movq   %rsp, %rbp
    0x7fff5a60f36c <+4>: pushq  %r15
    0x7fff5a60f36e <+6>: pushq  %r14
(lldb)
frame #4: 0x0000000100a28ce9 icinga2`boost::intrusive_ptr<icinga::PerfdataValue>::operator->(this=0x000070000430c800) const at intrusive_ptr.hpp:199
   196
   197      T * operator->() const BOOST_SP_NOEXCEPT_WITH_ASSERT
   198      {
-> 199          BOOST_ASSERT( px != 0 );
   200          return px;
   201      }
   202
(lldb)
frame #5: 0x0000000100e67a63 icinga2`icinga::ElasticsearchWriter::AddCheckResult(this=0x0000000104806c00, fields=0x000070000430d060, checkable=0x000000010559e330, cr=0x000000010559e338) at elasticsearchwriter.cpp:151
   148                  }
   149              }
   150
-> 151              String escapedKey = pdv->GetLabel();
   152              boost::replace_all(escapedKey, " ", "_");
   153              boost::replace_all(escapedKey, ".", "_");
   154              boost::replace_all(escapedKey, "\\", "_");
(lldb) p pdv
(Ptr) $0 = {
  px = 0x0000000000000000
}