Josef-Friedrich / check_systemd

This systemd check for nagios compatible monitoring systems will report a degraded systemd to your monitoring solution. It can also be used to monitor individual systemd services and timers units.
https://check-systemd.readthedocs.io
GNU Lesser General Public License v2.1
29 stars 16 forks source link

timers check on rhel7 failing on readahead service #20

Open martinrm77 opened 2 years ago

martinrm77 commented 2 years ago

Using check_systemd 2.3.1. Fully patched RHEL7.9 server has this issue. It is a virtualised server on a vmware platform. It keeps failing on a service, which it should not, as it is n/a in the next field, correct?

[root@server1 plugins]# systemctl status systemd-readahead-done.timer ● systemd-readahead-done.timer - Stop Read-Ahead Data Collection 10s After Completed Startup Loaded: loaded (/usr/lib/systemd/system/systemd-readahead-done.timer; indirect; vendor preset: enabled) Active: inactive (dead) Condition: start condition failed at Mon 2022-01-03 10:45:35 CET; 22h ago ConditionVirtualization=no was not met Docs: man:systemd-readahead-replay.service(8) [root@server1 plugins]# systemctl list-timers --all NEXT LEFT LAST PASSED UNIT ACTIVATES Tue 2022-01-04 10:58:02 CET 1h 48min left Mon 2022-01-03 10:58:02 CET 22h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service n/a n/a n/a n/a systemd-readahead-done.timer systemd-readahead-done.service

2 timers listed. [root@server1 plugins]# /usr/lib64/nagios/plugins/check_systemd.py -t -n -vvv SYSTEMD CRITICAL - systemd-readahead-done.timer critical: systemd-readahead-done.timer | count_units=319 data_source=cli units_activating=0 units_active=215 units_failed=0 units_inactive=104

Josef-Friedrich commented 2 years ago

There was a feature request that this inactive (dead) timer should be reported. Would you please take a look at this issue: https://github.com/Josef-Friedrich/check_systemd/issues/6#issuecomment-630389306

martinrm77 commented 2 years ago

Ok, I read the request, but I dont understand why a dead timer should be taken as an error. If the field for NEXT is set to N/A it is because the timer service is not enabled, or.. as in my case, a pre-requisite is not met, so the service is not suppoed to be running. After experimenting I can confirm this, but I find it difficult to pin down the part in the systemd documentation where it is explicitly stated what the N/A value in the NEXT field signifies.

Josef-Friedrich commented 2 years ago

I must admit that I am very unsure how to deal with this issue. Your reasoning sounds perfectly understandable. I believe, however, that there are situations in which someone wants to check, that the NEXT field is never N/A. Maybe we should introduce a new option for example -T that checks only the LAST field?

martinrm77 commented 2 years ago

I am ok with an option, so that it will not check disabled timers - so that wont break existing configs, but I feel it should be the default to not check disabled timers. The same seems to be the case when a service is not enabled, check-systemd.py -u will not find it. that is just how systemd works.

martinrm77 commented 2 years ago

I have now seen a situation where the service is enabled, but still N/A - which is when the service is enabled but has not yet run successfully once. So.. how to not target timers that are disabled, but not enable and not yet processed...

example for service enabled but failed first run:

[root@servera ~]# systemctl list-timers -a
NEXT                         LEFT      LAST                         PASSED       UNIT                         ACTIVATES
Wed 2022-02-09 14:50:00 CET  7min left Wed 2022-02-09 14:40:29 CET  1min 44s ago sysstat-collect.timer        sysstat-collect.service
Thu 2022-02-10 00:00:00 CET  9h left   Wed 2022-02-09 00:00:52 CET  14h ago      mlocate-updatedb.timer       mlocate-updatedb.service
Thu 2022-02-10 00:00:00 CET  9h left   Wed 2022-02-09 00:00:52 CET  14h ago      unbound-anchor.timer         unbound-anchor.service
Thu 2022-02-10 00:07:00 CET  9h left   Wed 2022-02-09 00:07:42 CET  14h ago      sysstat-summary.timer        sysstat-summary.service
Thu 2022-02-10 00:54:44 CET  10h left  Wed 2022-02-09 02:22:52 CET  12h ago      insights-client.timer        insights-client.service
Thu 2022-02-10 14:10:52 CET  23h left  Wed 2022-02-09 14:10:52 CET  31min ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
n/a                          n/a       n/a                          n/a          dnf-makecache.timer          dnf-makecache.service

7 timers listed.
[root@servera ~]# systemctl status dnf-makecache.timer
● dnf-makecache.timer - dnf makecache --timer
   Loaded: loaded (/usr/lib/systemd/system/dnf-makecache.timer; enabled; vendor preset: enabled)
   Active: active (elapsed) since Wed 2022-02-02 13:51:41 CET; 1 weeks 0 days ago
  Trigger: n/a
Josef-Friedrich commented 2 years ago

Thank you for reporting this. Would it be possible to share the output of the command systemctl show dnf-makecache.timer?

martinrm77 commented 2 years ago

I have had problems putting a server in the same state as this again, but I think I have one now.

The service is now systemd-readahead-done.timer and it has the following properties:

[root@serverb]# systemctl show systemd-readahead-done.timer Unit=systemd-readahead-done.service NextElapseUSecRealtime=infinity NextElapseUSecMonotonic=infinity LastTriggerUSec=0 LastTriggerUSecMonotonic=0 Result=success AccuracyUSec=1s RandomizedDelayUSec=0 Persistent=no WakeSystem=no Id=systemd-readahead-done.timer Names=systemd-readahead-done.timer WantedBy=systemd-readahead-collect.service Conflicts=shutdown.target Before=shutdown.target systemd-readahead-done.service After=multi-user.target Triggers=systemd-readahead-done.service Documentation=man:systemd-readahead-replay.service(8) Description=Stop Read-Ahead Data Collection 10s After Completed Startup LoadState=loaded ActiveState=inactive SubState=dead FragmentPath=/usr/lib/systemd/system/systemd-readahead-done.timer UnitFileState=indirect UnitFilePreset=enabled InactiveExitTimestampMonotonic=0 ActiveEnterTimestampMonotonic=0 ActiveExitTimestampMonotonic=0 InactiveEnterTimestampMonotonic=0 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=no OnFailureJobMode=replace IgnoreOnIsolate=no IgnoreOnSnapshot=no NeedDaemonReload=no JobTimeoutUSec=0 JobTimeoutAction=none ConditionResult=no AssertResult=no ConditionTimestamp=Sat 2022-05-21 05:07:08 CEST ConditionTimestampMonotonic=139495216 AssertTimestampMonotonic=0

Josef-Friedrich commented 2 years ago

Thank you very much!

martinrm77 commented 2 years ago

I also found the correct service, just after boot it shows the issue:

NEXT                          LEFT       LAST                          PASSED      UNIT                         ACTIVATES
Tue 2022-05-31 13:50:00 CEST  8min left  Tue 2022-05-31 13:40:49 CEST  1min 6s ago sysstat-collect.timer        sysstat-collect.service
Tue 2022-05-31 13:53:49 CEST  11min left n/a                           n/a         systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Tue 2022-05-31 14:35:17 CEST  53min left n/a                           n/a         dnf-makecache.timer          dnf-makecache.service
Wed 2022-06-01 00:00:00 CEST  10h left   Tue 2022-05-31 00:00:48 CEST  13h ago     mlocate-updatedb.timer       mlocate-updatedb.service
Wed 2022-06-01 00:00:00 CEST  10h left   Tue 2022-05-31 00:00:48 CEST  13h ago     unbound-anchor.timer         unbound-anchor.service
Wed 2022-06-01 00:07:00 CEST  10h left   n/a                           n/a         sysstat-summary.timer        sysstat-summary.service
Wed 2022-06-01 03:59:14 CEST  14h left   Tue 2022-05-31 00:00:48 CEST  13h ago     insights-client.timer        insights-client.service

and the properties for this service:

Restart=no
NotifyAccess=none
RestartUSec=100ms
TimeoutStartUSec=infinity
TimeoutStopUSec=1min 30s
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=0
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
UID=[not set]
GID=[not set]
NRestarts=0
ExecMainStartTimestampMonotonic=0
ExecMainExitTimestampMonotonic=0
ExecMainPID=0
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/dnf ; argv[]=/usr/bin/dnf makecache --timer ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/>
Slice=system.slice
MemoryCurrent=[not set]
CPUUsageNSec=[not set]
EffectiveCPUs=
EffectiveMemoryNodes=
TasksCurrent=[not set]
IPIngressBytes=18446744073709551615
IPIngressPackets=18446744073709551615
IPEgressBytes=18446744073709551615
IPEgressPackets=18446744073709551615
Delegate=no
CPUAccounting=no
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
AllowedCPUs=
AllowedMemoryNodes=
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=50691
IPAccounting=no
Environment=ABRT_IGNORE_PYTHON=1
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=0
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=262144
LimitNOFILESoft=1024
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=31682
LimitNPROCSoft=31682
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=31682
LimitSIGPENDINGSoft=31682
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
Nice=19
IOSchedulingClass=2
IOSchedulingPriority=7
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinity=
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
NUMAMask=
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_n>
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
Nice=19
IOSchedulingClass=2
IOSchedulingPriority=7
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinity=
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
NUMAMask=
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_n>
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes