collectd / collectd

The system statistics collection daemon. Please send Pull Requests here!
http://collectd.org
Other
3.02k stars 1.23k forks source link

rrdtool plugin generates warnings when cache is enabled #3963

Open iosonogio opened 2 years ago

iosonogio commented 2 years ago

Expected behavior

No warnings in the log and rrdtool plugin functioning normally even with cache enabled.

Actual behavior

This happens when CacheTimeout is set. Rrdtool plugin complains about illegal attempt to update due to wrong time and logs warnings for each configured plugin. A sample log follows:

[2022-01-16 08:54:42] [warning] rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/localhost/cpu-0/cpu-user.rrd) failed: /var/lib/collectd/rrd/localhost/cpu-0/cpu-user.rrd: illegal attempt to update using time 1642319562 when last update time is 1642319676 (minimum one second step)

Steps to reproduce

  1. Configure the rrdtool plugin to use the cash by setting CacheTimeout and CacheFlush, as per the following config. Here I set also the WritePerSeconds but it does not matter.
<Plugin rrdtool>
    DataDir "/var/lib/collectd/rrd"
    CacheTimeout 90
    CacheFlush 900
#    WritesPerSecond 30
#    CreateFilesAsync false
#    RandomTimeout 0
</Plugin>
  1. Stop collectd.
  2. Delete the RRD files in /var/lib/collectd/rrd/...
  3. Start collectd.
  4. Wait at most a couple minutes for those warnings to appear in the log.

I left the other settings to their defaults as well as I set them to suggested values, but the behavior does not change.

My config is the following:

Hostname "localhost"
FQDNLookup false
#BaseDir "/var/lib/collectd"
#PluginDir "/usr/lib/collectd"
TypesDB "/usr/share/collectd/types.db" "/etc/collectd/my_types.db"

#AutoLoadPlugin false

CollectInternalStats true

#Interval 10

#MaxReadInterval 86400
#Timeout         2
#ReadThreads     5
#WriteThreads    5

# Limit the size of the write queue. Default is no limit. Setting up a limit
# is recommended for servers handling a high volume of traffic.
#WriteQueueLimitHigh 1000000
#WriteQueueLimitLow   800000

LoadPlugin logfile
<Plugin logfile>
    LogLevel "info"
    File "/var/log/collectd.log"
    Timestamp true
    PrintSeverity true
</Plugin>

LoadPlugin apache
LoadPlugin cpu
LoadPlugin curl_json
LoadPlugin dbi
LoadPlugin df
LoadPlugin disk
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin processes
LoadPlugin rrdtool
LoadPlugin swap
LoadPlugin tcpconns
LoadPlugin tail

Include "/etc/collectd/plugins.conf.d/*.conf"

The time on my machine is correctly set via NTP.

iosonogio commented 2 years ago

Anyone experiencing this same issue?

cpaelzer commented 1 year ago

Hi @iosonogio - I have myself seen the same on Ubuntu (once) and have a report of others seeing it.

So far I've only seen it myself on the shutdown part of the service and sadly I have no 100% reproduction rate of the issue (not with your steps, nor with mine which are quite similar).

TL;DR: you are not alone, but without a better reproducer I'm unsure if anyone can do something about it. Does anyone affected have the details to make this occur 100%?