Juniper / contrail-charms

Juju charms for Contrail services.
Apache License 2.0
13 stars 22 forks source link

NRPE check timeout #154

Open majduk opened 4 years ago

majduk commented 4 years ago

Due to a fact that contrail-status runs for over 10s, contrail nagios check times out:

root@UPSR-BRBFD-01-0008:~# time /usr/local/lib/nagios/plugins/check_contrail_status_controller.py 5.1
Contrail status OK

real    0m11.399s
user    0m0.132s
sys     0m0.057s

This leads to a false positive alarm in nagios.

Andrey-mp commented 4 years ago

looks like it's not positive... why contrail-status works more than 10seconds? i saw when DNS doesn't work correctly - it can take 10 seconds.

majduk commented 4 years ago

All units report status OK. Also in Contrail WebUI there are no alarms. Nothing that could indicate an issue whatsoever.:

root@UPSR-BRBFD-01-0008:~# time contrail-status                                                                                                                             
Pod              Service         Original Name                          Original Version  State    Id            Status                                                
                 redis           contrail-external-redis                1912-32           running  019aeb6e6daa  Up 4 weeks
analytics        api             contrail-analytics-api                 1912-32           running  4d0a54533805  Up 10 hours
analytics        collector       contrail-analytics-collector           1912-32           running  3241d7610545  Up 2 weeks
analytics        nodemgr         contrail-nodemgr                       1912-32           running  c886e6fd6372  Up 2 weeks
analytics-alarm  alarm-gen       contrail-analytics-alarm-gen           1912-32           running  52720331a81e  Up 10 hours
analytics-alarm  kafka           contrail-external-kafka                1912-32           running  7924a3690608  Up 2 weeks
analytics-alarm  nodemgr         contrail-nodemgr                       1912-32           running  903380a284c9  Up 2 weeks
analytics-snmp   nodemgr         contrail-nodemgr                       1912-32           running  6bd26cbb1519  Up 2 weeks
analytics-snmp   snmp-collector  contrail-analytics-snmp-collector      1912-32           running  365f1eaa069a  Up 10 hours
analytics-snmp   topology        contrail-analytics-snmp-topology       1912-32           running  2f44a7e7a5f5  Up 10 hours
config           api             contrail-controller-config-api         1912-32           running  9939958cd7df  Up 2 weeks
config           device-manager  contrail-controller-config-devicemgr   1912-32           running  425b87ebe9f5  Up 10 hours
config           nodemgr         contrail-nodemgr                       1912-32           running  e98ec19c8ef4  Up 2 weeks
config           schema          contrail-controller-config-schema      1912-32           running  439f3fae4ed2  Up 10 hours
config           svc-monitor     contrail-controller-config-svcmonitor  1912-32           running  1cc003d6f15f  Up 25 hours
config-database  cassandra       contrail-external-cassandra            1912-32           running  2b3178c6790b  Up 2 weeks
config-database  nodemgr         contrail-nodemgr                       1912-32           running  e6841e7f1583  Up 2 weeks
config-database  rabbitmq        contrail-external-rabbitmq             1912-32           running  af5f9f6eca17  Up 2 weeks
config-database  zookeeper       contrail-external-zookeeper            1912-32           running  a796fc653dcd  Up 2 weeks
control          control         contrail-controller-control-control    1912-32           running  4486f22ab836  Up 2 weeks
control          dns             contrail-controller-control-dns        1912-32           running  f8a7fd180f71  Up 2 weeks
control          named           contrail-controller-control-named      1912-32           running  a3e54ff46a44  Up 2 weeks
control          nodemgr         contrail-nodemgr                       1912-32           running  89afad426951  Up 2 weeks
database         cassandra       contrail-external-cassandra            1912-32           running  513c47659981  Up 4 weeks
database         nodemgr         contrail-nodemgr                       1912-32           running  39ef970fdf39  Up 4 weeks
database         query-engine    contrail-analytics-query-engine        1912-32           running  ba1a23b5152b  Up 4 weeks
webui            job             contrail-controller-webui-job          1912-32           running  e90459c6c38c  Up 2 weeks
webui            web             contrail-controller-webui-web          1912-32           running  c1e80570872a  Up 2 weeks

== Contrail control ==
control: active
nodemgr: active
named: active             
dns: active

== Contrail analytics-alarm ==
nodemgr: active
kafka: active
alarm-gen: active

== Contrail database ==
nodemgr: active
query-engine: active
cassandra: active

== Contrail analytics ==
nodemgr: active
api: active
collector: active

== Contrail config-database ==
nodemgr: active
zookeeper: active <### --- pauses here for 5 sec --- ###>
rabbitmq: active
cassandra: active

== Contrail webui ==
web: active
job: active

== Contrail analytics-snmp ==
snmp-collector: active
nodemgr: active
topology: active

== Contrail config ==
svc-monitor: active
nodemgr: active
device-manager: active
api: active
schema: backup

real    0m10.792s
user    0m0.064s
sys     0m0.046s
Andrey-mp commented 4 years ago

ok, got it. but honestly I don't why zookeeper is so slow in your env. and I can't say how to increase nrpe check timeout right now (and i don't think that it's a good way)

majduk commented 4 years ago

I've already increaset the timeout.

contrail-status takes pretty much the same amount of time in all envs where we have 19.XX release.

kashif-nawaz commented 4 years ago

deleted my old comments, because those were pointing to some other problem which is not related to this issue. thanks