Icinga / icingaweb2-module-idoreports

Icinga IDO Reports provides host and service availability reports for Icinga based on the monitoring database (IDO).
GNU General Public License v2.0
34 stars 13 forks source link

PostgreSQL: Fix availability when availability is 100% #20

Closed bsdlme closed 4 years ago

bsdlme commented 4 years ago

The calculation returned 0 for availability when there was no downtime duration. So instead of 100 it was always 0. This patch checks if the duration time is zero and returns 100.00 else the actual SLA is calculated and returned.

bluthg commented 4 years ago

I'm pretty sure that the code does exactly the right thing, and don't see how it fails on "no downtime" situations (simulated below):

# select 
extract('EPOCH' FROM '00:00:00'::interval) est_downtime,
100.0 - extract('EPOCH' FROM '00:00:00'::interval) / extract('EPOCH' FROM '12:00:00'::interval) * 100.0 est_availability;
 est_downtime | est_availability 
--------------+------------------
            0 |              100
(1 Zeile)

Maybe there's some glitch that turns this "all up" interval into a "all down" one?

bsdlme commented 4 years ago

Maybe there's some glitch that turns this "all up" interval into a "all down" one?

I don't know. I can only say that the original calculation only showed 0. :(

bluthg commented 4 years ago

Which essentially cannot be the case when you do the math. This is really odd. 100.0 - 0.0/x * 100.0 == 100.0 (for any x <> 0)

bluthg commented 4 years ago

Could you provide some data where this occurs?

mbanck commented 4 years ago

This hit me as well yesterday. As far as I can tell, the problem is not if the downtime is 0.0, but if it is NULL, i.e. no downtimes found, then:

ICINGA=# SELECT 100.0 - NULL / EXTRACT('epoch' FROM '2020-02-05 00:00:00'::timestamp - '2020-02-04 00:00:00'::timestamp) * 100.0 AS availability;
 availability 
--------------

(1 row)

ICINGA=# SELECT 100.0 - COALESCE(NULL, 0.0) / EXTRACT('epoch' FROM '2020-02-05 00:00:00'::timestamp - '2020-02-04 00:00:00'::timestamp) * 100.0 AS availability;
 availability 
--------------
          100
(1 row)

I think just wrapping it in COALESCE should be enough.

bluthg commented 4 years ago

Uh, that's a very good hint indeed! I wonder how we get NULL though...

I'll see if I can construct a test case (@Michael, could you outline the data under inspection that led there for you?).

Am 06.02.2020 um 08:47 schrieb Michael Banck:

This hit me as well yesterday. As far as I can tell, the problem is not if the downtime is |0.0|, but if it is |NULL|, i.e. no downtimes found, then:

|ICINGA=# SELECT 100.0 - NULL / EXTRACT('epoch' FROM '2020-02-05 00:00:00'::timestamp - '2020-02-04 00:00:00'::timestamp) * 100.0 AS availability; availability -------------- (1 row) ICINGA=# SELECT 100.0

  • COALESCE(NULL, 0.0) / EXTRACT('epoch' FROM '2020-02-05 00:00:00'::timestamp - '2020-02-04 00:00:00'::timestamp) * 100.0 AS availability; availability -------------- 100 (1 row) |

I think just wrapping it in |COALESCE| should be enough.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Icinga/icingaweb2-module-idoreports/pull/20?email_source=notifications&email_token=ABYBLXIOOL7BPW2BITNNTD3RBO6CVA5CNFSM4KHEQC52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK6H4CA#issuecomment-582778376, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYBLXJCKNMIFJPM4Z73PS3RBO6CVANCNFSM4KHEQC5Q.

-- Gunnar "Nick" Bluth Geschäftsführer

Pro Open GmbH Eimermacherweg 106 D-48159 Münster

HRB 18122, AG MS

Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de


"Ceterum censeo SystemD esse delendam" - Cato

mbanck commented 4 years ago

http://share.credativ.com/~mba/.dump.sql.gz is a minimal dump that you can restore into an empty database along with the two schema files from this branch.

Then:

icinga_test=# SELECT h.display_name AS host_display_name, h.display_name AS
 host_display_name, idoreports_get_sla_ok_percent(ho.object_id,
  '2020-02-04 16:11:16', '2020-02-05 17:11:16', NULL) AS sla FROM icinga_objects AS ho
  INNER JOIN icinga_hosts AS h ON h.host_object_id = ho.object_id AND ho.is_active = 1 
    AND ho.objecttype_id = 1 ORDER BY LOWER(h.display_name) ASC;
   host_display_name   |   host_display_name   | sla 
-----------------------+-----------------------+-----
 vagrant-openSUSE-Leap | vagrant-openSUSE-Leap |    
(1 Zeile)
bluthg commented 4 years ago

Hmmmmm.... this:

mbanck# SELECT h.display_name AS host_display_name, h.display_name AS host_display_name, idoreports_get_sla_ok_percent(ho.object_id::int, '2020-02-04 16:11:16', '2020-02-05 17:11:16', NULL) AS sla FROM icinga_objects AS ho
  INNER JOIN icinga_hosts AS h ON h.host_object_id = ho.object_id AND ho.is_active = 1 AND ho.objecttype_id = 1 ORDER BY LOWER(h.display_name) ASC;
   host_display_name   │   host_display_name   │ sla 
═══════════════════════╪═══════════════════════╪═════
 vagrant-openSUSE-Leap │ vagrant-openSUSE-Leap │ 100

made me compare the code I wrote back then with what's in the repo... and :drumroll:, it's outdated :/

Stay tuned...

bluthg commented 4 years ago

See PR #23

bluthg commented 4 years ago

@bsdlme, mind checking your data with the updated version in #23 too?

bsdlme commented 4 years ago

Sure, but how can I download the patch from #23?

bluthg commented 4 years ago

Sure, but how can I download the patch from #23? Easiest is probably to checkout my repo: https://github.com/bluthg/icingaweb2-module-idoreports/tree/feature/postgresql (It's only differing from upstream in the feature/postgresql branch)

bsdlme commented 4 years ago

Thanks, this version works for me as well!

lippserd commented 4 years ago

Hi,

Sorry for reopening but I don't want to lose track of the conversation and the fixes we have to incorporate. I'll close this PR once I've updated the procedure with @bluthg changes.

Thanks to all.

Best, Eric

lippserd commented 4 years ago

Please see PR #30