collectd / collectd

The system statistics collection daemon. Please send Pull Requests here!
http://collectd.org
Other
3.17k stars 1.23k forks source link

possible issue with libvirt/kvm on RHEL 7.4 #2410

Closed thewiwi closed 5 years ago

thewiwi commented 7 years ago

Actual behavior

collectd crashes. seems to be related to libvirt and/or kvm. system reboot after a few hours with this problem. systems without libvirt/kvm do not seem to be affected.

(Description of the behavior / output that you observed)

Aug 27 12:29:18 turing.progon.net systemd: Started Session c2 of user root.
Aug 27 12:29:18 turing.progon.net systemd: Starting Session c2 of user root.
Aug 27 12:29:43 turing.progon.net systemd: Started statistics collection daemon.
Aug 27 12:29:43 turing.progon.net systemd: Starting statistics collection daemon...
Aug 27 12:29:43 turing.progon.net collectd: Created new plugin context.
Aug 27 12:29:43 turing.progon.net collectd: [2017-08-27 12:29:43] plugin_load: plugin "logfile" successfully loaded.
Aug 27 12:29:43 turing.progon.net collectd: [2017-08-27 12:29:43] type = logfile, key = LogLevel, value = info
Aug 27 12:29:43 turing.progon.net kernel: reader#0[45219]: segfault at 1 ip 00007f0722f2bc8f sp 00007f07165fbbf8 error 4 in libc-2.17.so[7f0722e97000+1b8000]
Aug 27 12:29:43 turing.progon.net abrt-hook-ccpp: Process 45213 (collectd) of user 0 killed by SIGSEGV - dumping core
Aug 27 12:29:43 turing.progon.net abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes
Aug 27 12:29:43 turing.progon.net libvirtd: 2017-08-27 10:29:43.877+0000: 1813: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 27 12:29:43 turing.progon.net systemd: as34288.collectd.service: main process exited, code=killed, status=11/SEGV
Aug 27 12:29:43 turing.progon.net systemd: Unit as34288.collectd.service entered failed state.
Aug 27 12:29:43 turing.progon.net systemd: as34288.collectd.service failed.
Aug 27 12:29:44 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:29:44 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:29:45 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:29:54 turing.progon.net systemd: as34288.collectd.service holdoff time over, scheduling restart.
Aug 27 12:29:54 turing.progon.net systemd: Started statistics collection daemon.
Aug 27 12:29:54 turing.progon.net systemd: Starting statistics collection daemon...
Aug 27 12:29:54 turing.progon.net collectd: Created new plugin context.
Aug 27 12:29:54 turing.progon.net collectd: [2017-08-27 12:29:54] plugin_load: plugin "logfile" successfully loaded.
Aug 27 12:29:54 turing.progon.net collectd: [2017-08-27 12:29:54] type = logfile, key = LogLevel, value = info
Aug 27 12:29:54 turing.progon.net kernel: reader#0[46012]: segfault at 1 ip 00007f82375d1c8f sp 00007f822aca1bf8 error 4 in libc-2.17.so[7f823753d000+1b8000]
Aug 27 12:29:54 turing.progon.net abrt-hook-ccpp: Process 46006 (collectd) of user 0 killed by SIGSEGV - ignoring (repeated crash)
Aug 27 12:29:54 turing.progon.net libvirtd: 2017-08-27 10:29:54.114+0000: 1813: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 27 12:29:54 turing.progon.net systemd: as34288.collectd.service: main process exited, code=killed, status=11/SEGV
Aug 27 12:29:54 turing.progon.net systemd: Unit as34288.collectd.service entered failed state.
Aug 27 12:29:54 turing.progon.net systemd: as34288.collectd.service failed.
Aug 27 12:30:01 turing.progon.net systemd: Started Session 85 of user root.
Aug 27 12:30:01 turing.progon.net systemd: Starting Session 85 of user root.
Aug 27 12:30:04 turing.progon.net systemd: as34288.collectd.service holdoff time over, scheduling restart.
Aug 27 12:30:04 turing.progon.net systemd: Started statistics collection daemon.
Aug 27 12:30:04 turing.progon.net systemd: Starting statistics collection daemon...
Aug 27 12:30:04 turing.progon.net collectd: Created new plugin context.
Aug 27 12:30:04 turing.progon.net collectd: [2017-08-27 12:30:04] plugin_load: plugin "logfile" successfully loaded.
Aug 27 12:30:04 turing.progon.net collectd: [2017-08-27 12:30:04] type = logfile, key = LogLevel, value = info
Aug 27 12:30:04 turing.progon.net kernel: reader#3[46596]: segfault at 1 ip 00007f7f7b9c0c8f sp 00007f7f6d88dbf8 error 4 in libc-2.17.so[7f7f7b92c000+1b8000]
Aug 27 12:30:04 turing.progon.net abrt-hook-ccpp: Process 46587 (collectd) of user 0 killed by SIGSEGV - dumping core
Aug 27 12:30:04 turing.progon.net abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes
Aug 27 12:30:05 turing.progon.net libvirtd: 2017-08-27 10:30:05.627+0000: 1813: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 27 12:30:05 turing.progon.net systemd: as34288.collectd.service: main process exited, code=killed, status=11/SEGV
Aug 27 12:30:05 turing.progon.net systemd: Unit as34288.collectd.service entered failed state.
Aug 27 12:30:05 turing.progon.net systemd: as34288.collectd.service failed.
Aug 27 12:30:07 turing.progon.net systemd: Stopped statistics collection daemon.
Aug 27 12:30:34 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:30:35 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:30:35 turing.progon.net abrt-server: Generating core_backtrace
Aug 27 12:30:36 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:30:36 turing.progon.net abrt-server: Duplicate: core backtrace
Aug 27 12:30:36 turing.progon.net abrt-server: DUP_OF_DIR: /var/spool/abrt/ccpp-2017-08-27-00:33:47-20171
Aug 27 12:30:36 turing.progon.net abrt-server: Deleting problem directory ccpp-2017-08-27-12:29:43-45213 (dup of ccpp-2017-08-27-00:33:47-20171)
Aug 27 12:30:36 turing.progon.net abrt-server: Email address of sender was not specified. Would you like to do so now? If not, 'user@localhost' is to be used [y/N] 
Aug 27 12:30:36 turing.progon.net abrt-server: Email address of receiver was not specified. Would you like to do so now? If not, 'root@localhost' is to be used [y/N] 
Aug 27 12:30:36 turing.progon.net abrt-server: Sending an email...
Aug 27 12:30:36 turing.progon.net abrt-server: Sending a notification email to: root@localhost
Aug 27 12:30:36 turing.progon.net abrt-server: Email was sent to: root@localhost
Aug 27 12:30:37 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:30:37 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
Aug 27 12:30:38 turing.progon.net abrt-server: '/var/spool/abrt/ccpp-2017-08-25-13:32:40-37231' is not a problem directory
thewiwi commented 7 years ago

Build options:

Configuration: Build: Platform . . . . . . Linux CC . . . . . . . . . gcc -std=gnu99 CFLAGS . . . . . . . -Wall -Werror -g -O2 CXXFLAGS . . . . . . -Wall -Werror -g -O2 CPP . . . . . . . . . gcc -E CPPFLAGS . . . . . . GRPC_CPP_PLUGIN . . . LD . . . . . . . . . /bin/ld -m elf_x86_64 LDFLAGS . . . . . . . PROTOC . . . . . . . YACC . . . . . . . . bison -y YFLAGS . . . . . . .

Libraries: intel mic . . . . . . no (MicAccessApi not found) libaquaero5 . . . . . no (libaquaero5.h not found) libatasmart . . . . . no (atasmart.h not found) libcurl . . . . . . . yes libdbi . . . . . . . no (dbi/dbi.h not found) libdpdk . . . . . . . no (rte_config.h not found) libesmtp . . . . . . no (libesmtp not found) libganglia . . . . . no (gm_protocol.h not found) libgcrypt . . . . . . yes libgps . . . . . . . no (gps.h not found) libgrpc++ . . . . . . no (pkg-config could not find libgrpc++) libhal . . . . . . . no libhiredis . . . . . no (hiredis.h not found) libi2c-dev . . . . . no (symbol i2c_smbus_read_i2c_block_data not found - have you installed libi2c-dev ?) libiokit . . . . . . no libiptc . . . . . . . no (pkg-config doesn't know libiptc) libjvm . . . . . . . no (jar not found) libkstat . . . . . . no (Solaris only) libkvm . . . . . . . no libldap . . . . . . . no ('ldap.h' not found) liblua . . . . . . . no (pkg-config cannot find liblua) liblvm2app . . . . . no (lvm2app.h not found) libmemcached . . . . no (libmemcached/memcached.h not found) libmicrohttpd . . . . no (pkg-config could not find libmicrohttpd) libmnl . . . . . . . no (symbol 'mnl_nlmsg_get_payload' not found) libmodbus . . . . . . no (pkg-config doesn't know libmodbus) libmongoc . . . . . . no ('mongo.h' not found) libmosquitto . . . . no (mosquitto.h not found) libmysql . . . . . . no (mysql_config failed) libnetapp . . . . . . no (netapp_api.h not found) libnetsnmp . . . . . yes libnotify . . . . . . no (pkg-config doesn't know libnotify) liboconfig . . . . . yes (shipped version) libopenipmi . . . . . no (pkg-config doesn't know OpenIPMIpthread) liboping . . . . . . no (oping.h not found) libowcapi . . . . . . no (owcapi.h not found) libpcap . . . . . . . no (pcap.h misses PCAP_ERROR_IFACE_NOT_UP) libperfstat . . . . . no (AIX only) libperl . . . . . . . no libpq . . . . . . . . no (libpq-fe.h not found) libpqos . . . . . . . no (pqos.h not found) libprotobuf . . . . . no (libprotobuf not found) libprotobuf-c . . . . no (libprotobuf-c not found) libpython . . . . . . no librabbitmq . . . . . no (amqp.h not found) libriemann-client . . no (pkg-config doesn't know libriemann-client) librdkafka . . . . . no (librdkafka/rdkafka.h not found) librouteros . . . . . no (routeros_api.h not found) librrd . . . . . . . no (rrd.h not found) libsensors . . . . . yes libsigrok . . . . . no (pkg-config could not find libsigrok) libstatgrab . . . . . no (pkg-config doesn't know libstatgrab) libtokyotyrant . . . no (tcrdb.h not found) libudev . . . . . . . yes libupsclient . . . . no (pkg-config doesn't know libupsclient) libvarnish . . . . . no (pkg-config doesn't know varnishapi) libvirt . . . . . . . yes libxenctrl . . . . . no (xenctrl.h not found) libxml2 . . . . . . . yes libxmms . . . . . . . no libyajl . . . . . . . no (yajl/yajl_parse.h not found) oracle . . . . . . . no (ORACLE_HOME is not set) protobuf-c . . . . . no (protoc-c compiler not found) protoc 3 . . . . . . no

Features: daemon mode . . . . . yes debug . . . . . . . . yes

Bindings: perl . . . . . . . . no

Modules: aggregation . . . . . yes amqp . . . . . . . no (amqp.h not found) apache . . . . . . . yes apcups . . . . . . . yes apple_sensors . . . . no aquaero . . . . . . . no (libaquaero5.h not found) ascent . . . . . . . yes barometer . . . . . . no battery . . . . . . . yes bind . . . . . . . . yes ceph . . . . . . . . no cgroups . . . . . . . yes chrony. . . . . . . . yes conntrack . . . . . . yes contextswitch . . . . yes cpu . . . . . . . . . yes cpufreq . . . . . . . yes cpusleep . . . . . . yes csv . . . . . . . . . yes curl . . . . . . . . yes curl_json . . . . . . no curl_xml . . . . . . yes dbi . . . . . . . . . no (dbi/dbi.h not found) df . . . . . . . . . yes disk . . . . . . . . yes dns . . . . . . . . . no (pcap.h misses PCAP_ERROR_IFACE_NOT_UP) dpdkstat . . . . . . no drbd . . . . . . . . yes email . . . . . . . . yes entropy . . . . . . . yes ethstat . . . . . . . yes exec . . . . . . . . yes fhcount . . . . . . . yes filecount . . . . . . yes fscache . . . . . . . yes gmond . . . . . . . . no (gm_protocol.h not found) gps . . . . . . . . . no grpc . . . . . . . . no hddtemp . . . . . . . yes hugepages . . . . . . yes intel_rdt. . . . . . no (pqos.h not found) interface . . . . . . yes ipc . . . . . . . . . yes ipmi . . . . . . . . no iptables . . . . . . no (pkg-config doesn't know libiptc) ipvs . . . . . . . . yes irq . . . . . . . . . yes java . . . . . . . . no (jar not found) load . . . . . . . . yes logfile . . . . . . . yes log_logstash . . . . no lpar . . . . . . . . no (AIX only) lua . . . . . . . . . no (pkg-config cannot find liblua) lvm . . . . . . . . . no (lvm2app.h not found) madwifi . . . . . . . yes match_empty_counter . yes match_hashed . . . . yes match_regex . . . . . yes match_timediff . . . yes match_value . . . . . yes mbmon . . . . . . . . yes md . . . . . . . . . yes memcachec . . . . . . no (libmemcached/memcached.h not found) memcached . . . . . . yes memory . . . . . . . yes mic . . . . . . . . . no (MicAccessApi not found) modbus . . . . . . . no (pkg-config doesn't know libmodbus) mqtt . . . . . . . . no (mosquitto.h not found) multimeter . . . . . yes mysql . . . . . . . . no (mysql_config failed) netapp . . . . . . . no (netapp_api.h not found) netlink . . . . . . . no (symbol 'mnl_nlmsg_get_payload' not found) network . . . . . . . yes nfs . . . . . . . . . yes nginx . . . . . . . . yes notify_desktop . . . no (pkg-config doesn't know libnotify) notify_email . . . . no (libesmtp not found) notify_nagios . . . . yes ntpd . . . . . . . . yes numa . . . . . . . . yes nut . . . . . . . . . no (pkg-config doesn't know libupsclient) olsrd . . . . . . . . yes onewire . . . . . . . no (owcapi.h not found) openldap . . . . . . no ('ldap.h' not found) openvpn . . . . . . . yes oracle . . . . . . . no (ORACLE_HOME is not set) perl . . . . . . . . no (needs libperl) pf . . . . . . . . . no pinba . . . . . . . . no ping . . . . . . . . no (oping.h not found) postgresql . . . . . no (libpq-fe.h not found) powerdns . . . . . . yes processes . . . . . . yes protocols . . . . . . yes python . . . . . . . no redis . . . . . . . . no (hiredis.h not found) routeros . . . . . . no (routeros_api.h not found) rrdcached . . . . . . no rrdtool . . . . . . . no (rrd.h not found) sensors . . . . . . . yes serial . . . . . . . yes sigrok . . . . . . . no (pkg-config could not find libsigrok) smart . . . . . . . . no snmp . . . . . . . . yes statsd . . . . . . . yes swap . . . . . . . . yes syslog . . . . . . . yes table . . . . . . . . yes tail_csv . . . . . . yes tail . . . . . . . . yes tape . . . . . . . . no target_notification . yes target_replace . . . yes target_scale . . . . yes target_set . . . . . yes target_v5upgrade . . yes tcpconns . . . . . . yes teamspeak2 . . . . . yes ted . . . . . . . . . yes thermal . . . . . . . yes threshold . . . . . . yes tokyotyrant . . . . . no (tcrdb.h not found) turbostat . . . . . . yes unixsock . . . . . . yes uptime . . . . . . . yes users . . . . . . . . yes uuid . . . . . . . . yes varnish . . . . . . . no (pkg-config doesn't know varnishapi) virt . . . . . . . . yes vmem . . . . . . . . yes vserver . . . . . . . yes wireless . . . . . . yes write_graphite . . . yes write_http . . . . . yes write_kafka . . . . . no (librdkafka/rdkafka.h not found) write_log . . . . . . yes write_mongodb . . . . no ('mongo.h' not found) write_prometheus. . . no write_redis . . . . . no (hiredis.h not found) write_riemann . . . . no (pkg-config doesn't know libriemann-client) write_sensu . . . . . yes write_tsdb . . . . . yes xencpu . . . . . . . no xmms . . . . . . . . no zfs_arc . . . . . . . yes zone . . . . . . . . no zookeeper . . . . . . yes

collected logs:

[root@turing tmp]# cat /var/log/collectd/collectd.log [2017-08-27 12:29:43] [info] plugin_load: plugin "cpu" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "df" successfully loaded. [2017-08-27 12:29:43] [warning] Plugin df' did not register for valueReportReserved'. [2017-08-27 12:29:43] [info] plugin_load: plugin "disk" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "entropy" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "interface" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "virt" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "load" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "memory" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "network" successfully loaded. [2017-08-27 12:29:43] [info] plugin_load: plugin "swap" successfully loaded. [2017-08-27 12:29:43] [info] Initialization complete, entering read-loop. [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:43] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [info] plugin_load: plugin "cpu" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "df" successfully loaded. [2017-08-27 12:29:54] [warning] Plugin df' did not register for valueReportReserved'. [2017-08-27 12:29:54] [info] plugin_load: plugin "disk" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "entropy" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "interface" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "virt" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "load" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "memory" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "network" successfully loaded. [2017-08-27 12:29:54] [info] plugin_load: plugin "swap" successfully loaded. [2017-08-27 12:29:54] [info] Initialization complete, entering read-loop. [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:29:54] [error] plugin_dispatch_values: plugin_write_enqueue failed with status 12 (Cannot allocate memory). [2017-08-27 12:30:04] [info] plugin_load: plugin "cpu" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "df" successfully loaded. [2017-08-27 12:30:04] [warning] Plugin df' did not register for valueReportReserved'. [2017-08-27 12:30:04] [info] plugin_load: plugin "disk" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "entropy" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "interface" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "virt" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "load" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "memory" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "network" successfully loaded. [2017-08-27 12:30:04] [info] plugin_load: plugin "swap" successfully loaded. [2017-08-27 12:30:04] [info] Initialization complete, entering read-loop. [2017-08-27 12:30:04] [error] plugin_dispatch_values: Invalid value list from plugin zeuz-vm0. [2017-08-27 12:30:04] [error] plugin_dispatch_values: Invalid value list from plugin zeuz-vm0. [2017-08-27 12:30:04] [error] plugin_dispatch_values: Invalid value list from plugin zeuz-vm0. [2017-08-27 12:30:04] [error] plugin_dispatch_values: Invalid value list from plugin zeuz-vm0. [2017-08-27 12:30:04] [error] plugin_dispatch_values: Invalid value list from plugin zeuz-vm0. [2017-08-27 12:30:04] [error] plugin_dispatch_values: Invalid value list from plugin zeuz-vm0. [2017-08-27 12:30:04] [error] plugin_dispatch_values: Invalid value list from plugin zeuz-vm0.

thewiwi commented 7 years ago
[root@turing ccpp-2017-08-27-19:21:06-9429]# gdb /opt/as34288/collectd/sbin/collectd coredump 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/as34288/collectd/sbin/collectd...(no debugging symbols found)...done.
[New LWP 9438]
[New LWP 9434]
[New LWP 9430]
[New LWP 9433]
[New LWP 9432]
[New LWP 9436]
[New LWP 9431]
[New LWP 9439]
[New LWP 9435]
[New LWP 9437]
[New LWP 9429]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/opt/as34288/collectd/sbin/collectd -f'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f294839dc8f in __strncpy_sse2_unaligned () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install as34288-collectd-5.7.2-0.el7.x86_64
(gdb) backtrace full
#0  0x00007f294839dc8f in __strncpy_sse2_unaligned () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000414f91 in sstrncpy ()
No symbol table info available.
#2  0x00007f294621e36b in memory_stats_submit (tag_index=9, dom=0x7f29280009a0, memory=1539877686272) at virt.c:262
        values = {{counter = 4789130143270436864, gauge = 1539877686272, derive = 4789130143270436864, absolute = 4789130143270436864}}
        vl = {values = 0x7f293aa6bc90, values_len = 1, time = 0, interval = 10737418240, host = "turing.progon.net:zeuz-vm0", '\000' <repeats 37 times>, 
          plugin = "virt", '\000' <repeats 59 times>, plugin_instance = "zeuz-vm0", '\000' <repeats 55 times>, type = "memory", '\000' <repeats 57 times>, 
          type_instance = '\000' <repeats 63 times>, meta = 0x0}
        tags = {0x7f294621f1f5 "swap_in", 0x7f294621f1fd "swap_out", 0x7f294621f206 "major_fault", 0x7f294621f212 "minor_fault", 0x7f294621f21e "unused", 0x7f294621f225 "available", 
          0x7f294621f22f "actual_balloon", 0x7f294621f23e "rss"}
#3  lv_read () at virt.c:597
        info = {state = 1 '\001', maxMem = 4194304, memory = 4194304, nrVirtCpu = 4, cpuTime = 2137920000000}
        minfo = 0x7f29280041f0
        status = <optimized out>
        vinfo = 0x0
        j = <optimized out>
        t = 1503854466
        i = <optimized out>
#4  0x00000000004102da in plugin_read_thread ()
No symbol table info available.
#5  0x00007f29488d7e25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#6  0x00007f294840134d in clone () from /lib64/libc.so.6
No symbol table info available.
thewiwi commented 7 years ago

Prepending ./configure $OTHER_FLAGS CFLAGS="-g -O0" ... seems to make the crash go away.

collectd worked fine before upgrading to RHEL 7.4 ("yum update -y"). Both 5.6.1 and 5.7.2 show the same behavior.

rubenk commented 7 years ago

@fromanirh, @pszczerx, would you be so kind as to have a look?

octo commented 7 years ago

FYI, the above gdb output doesn't align with the 5.7.2 code. @thewiwi, do you happen to have a stack trace of an non-optimized (-O0) binary?

dchepishev commented 7 years ago

Hello, I see that communication here does not go fast and I get the same behavior on CentOS7.4, Is it ok if I provide stacktrace from CentOS7.4 ?

collectd-5.7.2 Here it is a stack trace with of binary compiled with: CFLAGS="-g -O0" ./configure --prefix=/usr/local/collectd

gdb /usr/local/collectd/sbin/collectd -f /usr/local/collectd/var/lib/collectd/core.17433 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/collectd/sbin/collectd...done.
[New LWP 17443]
[New LWP 17434]
[New LWP 17433]
[New LWP 17438]
[New LWP 17436]
[New LWP 17442]
[New LWP 17435]
[New LWP 17440]
[New LWP 17437]
[New LWP 17441]
[New LWP 17439]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/local/collectd/sbin/collectd -f'.
Program terminated with signal 11, Segmentation fault.
#0  __strncpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:296
▒▒/usr/src/debug/glibc-2.17-c758a686/sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:296:6571:beg:0x7fb0e7c74c8f
(gdb) backtrace full
#0  __strncpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:296
No locals.
#1  0x0000000000419681 in sstrncpy (dest=0x7fb0d9dd1d60 "", src=0x1 <Address 0x1 out of bounds>, n=128) at common.c:85
No locals.
#2  0x00007fb0e6bbf3cd in memory_stats_submit (tag_index=9, dom=0x7fb0d40209e0, memory=1543356812288) at virt.c:274
        values = {{counter = 4789144393770598400, gauge = 1543356812288, derive = 4789144393770598400, absolute = 4789144393770598400}}
        vl = {values = 0x7fb0d9dd1b00, values_len = 1, time = 0, interval = 32212254720, host = "xxxx1", '\000' <repeats 116 times>, plugin = "virt", '\000' <repeats 123 times>, 
          plugin_instance = "xxxx2", '\000' <repeats 116 times>, type = "memory", '\000' <repeats 121 times>, type_instance = '\000' <repeats 127 times>, meta = 0x0}
        tags = {0x7fb0e6bbfe4d "swap_in", 0x7fb0e6bbfe55 "swap_out", 0x7fb0e6bbfe5e "major_fault", 0x7fb0e6bbfe6a "minor_fault", 0x7fb0e6bbfe76 "unused", 0x7fb0e6bbfe7d "available", 
          0x7fb0e6bbfe87 "actual_balloon", 0x7fb0e6bbfe96 "rss"}
#3  lv_read () at virt.c:626
        j = <optimized out>
        info = {state = 1 '\001', maxMem = 125829120, memory = 125829120, nrVirtCpu = 18, cpuTime = 638493170000000}
        vinfo = 0x0
        minfo = 0x7fb0d4017c30
        status = <optimized out>
        i = <optimized out>
        t = 1507377089
#4  0x000000000040e6b3 in plugin_read_thread (args=0x0) at plugin.c:530
        callback = 0x7fb0e6bbe800 <lv_read>
        rf = 0x1618360
        old_ctx = {interval = 0, flush_interval = 0, flush_timeout = 0}
        status = 0
        rf_type = 0
        start = 1618533825843413520
        now = 1618533825843413033
        elapsed = 124713
        rc = 0
        __PRETTY_FUNCTION__ = "plugin_read_thread"
#5  0x00007fb0e81aee25 in start_thread (arg=0x7fb0d9dd2700) at pthread_create.c:308
        __res = <optimized out>
        pd = 0x7fb0d9dd2700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140397546120960, -1281031162991288975, 0, 140397546121664, 140397546120960, 0, 1250152743678696817, 1250044408001781105}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fb0e7cd834d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
No locals.
(gdb) 
ffromani commented 7 years ago

@rubenk ack, I got both RHEL 7.4 and CentOS 7.4 boxes and will look into this as soon as I can.

ffromani commented 7 years ago

Just looking at the provided stack traces, the first suspect seems to be tag_index exceeding the bounds of the known tags array in memory_stats_submit. But we do have protection for out of the bounds access, skipping tags reported by libvirt that the plugin doesn't know about. I'll try to reproduce locally. @dchepishev are you running the last snapshot of the 5.7 branch, right?

dchepishev commented 7 years ago

Hello, The source I am testing with is the one which is the release of 5.7.2 downloadable from here: https://storage.googleapis.com/collectd-tarballs/collectd-5.7.2.tar.bz2 If you want me to try with some git version, please let me know which one you want me to test with and I will do it ASAP. Regards, Deyan

octo commented 7 years ago

Deleted

Misleading comment ranting about a missing range check. The range is checked and I have no explanation why a sstrncpy() would be unhappy.

dchepishev commented 6 years ago

Hello,

I think I found the problem, and you probably do not need to fix anything

During the upgrade of centos from 7.3 to 7.4 libvirt is being updated from version: libvirt-2.0.0 to libvirt-3.2.0

This version change is probably causing collectd plugin virt.so to crash and act weirdly because it is linked against the old libvirt lib.

The issue which I was having was because virt.so was not being recompiled due to missing deps (which I missed, sorry for that) and collectd was loading the old virt.so which is linked against libvirt-2.0.0

With complete fresh install (only the conf file copied after the installation), all is good with the difference that on every collect loop, the log is being filled with messages like this

virt plugin: Array index out of bounds: tag_index = 9

for every VM running on the hypervisor.

I do not if this is normal.

ffromani commented 6 years ago

With complete fresh install (only the conf file copied after the installation), all is good with the difference that on every collect loop, the log is being filled with messages like this

virt plugin: Array index out of bounds: tag_index = 9 for every VM running on the hypervisor.

I do not if this is normal.

Hi, this is normal in the sense that the virt plugin is discarding the data it doesn't know about. The plugin in master was updated, and it could handle tag_index = 9 (and other fields) while remaining backward compatible.

I think this code will be available only in the new collectd release, I'm not sure the new virt plugin will be backported to the collectd 5.7.z branch.

rpv-tomsk commented 5 years ago

I think, this issue is resolved (and outdated, too). Feel free to (re)open if needed. Thanks for using Collectd.