CastawayLabs / cachet-monitor

Distributed monitoring plugin for CachetHQ
https://castawaylabs.github.io/cachet-monitor/
MIT License
439 stars 127 forks source link

Components or Incidents not Reverting back to Fixed or Operational #56

Open Mikagami opened 7 years ago

Mikagami commented 7 years ago

At the moment the incidents are being created and the components are being set to either "Partial Outage" or "Major Outage", as planned, but they don't revert back to "Fixed" or "Operational". Is there something that I am overlooking?

Here is a screenshot of cachet-monitor running. The creating incident works but not the resolved downtime... image

Monitor Code: "monitors": [ { "name": "Phone Directory", "url": "Site", "threshold": 80, "component_id": 2, "interval": 5, "timeout": 5, "expected_status_code": 200, "strict_tls": true }

Any Ideas?

matejkramny commented 7 years ago

Could be cachet's api being more up to date than the code here. The last work on the project was done in 2016 June/July.

I have plans to update this project and add more features in the near future.

Code that updates the component: https://github.com/CastawayLabs/cachet-monitor/blob/master/incident.go#L40

Mikagami commented 7 years ago

Thank You for your time on this great project.

matejkramny commented 7 years ago

@Mikagami can you check with the latest release?

riqueandre commented 7 years ago

@matejkramny Hi, i tried with the latest release. I got this error, but i checked twice my json.

time="2017-02-13T11:58:44-03:00" level=warning msg="Resolving incident" fields.time="13/02/2017 11:58:44 BRT" monitor="Portal Institucional" time="13/02/2017 11:58:44 BRT"

time="2017-02-13T11:58:44-03:00" level=info msg="Error sending incident: Cannot parse incident body: unexpected end of JSON input, " fields.time="13/02/2017 11:58:44 BRT" monitor="Portal Institucional" time="13/02/2017 11:58:44 BRT"

{ "api": { "url": "https://xxx.xxx.xxx.xx/api/v1", "token": "ODhIjEQZMJKk0H2yPS6r", "insecure": true }, "date_format": "02/01/2006 15:04:05 MST", "monitors": [ { "name": "Portal Institucional", "target": "https://xxx.xx", "strict": true, "method": "GET", "component_id": 1, "metric_id": 1, "template": { "investigating": { "subject": "{{ .Monitor.Name }} - {{ .SystemName }}", "message": "{{ .Monitor.Name }} check failed (server time: {{ .now }})\n\n{{ .FailReason }}" }, "fixed": { "subject": "{{ .Monitor.Name }} - {{ .SystemName }}", "message": " F I X E D " } }, "interval": 10, "timeout": 1, "threshold": 80, "headers": { "Authorization": "Basic " }, "expected_status_code": 200 } ] }

matejkramny commented 7 years ago

Cachet version?

matejkramny commented 7 years ago

Btw you don't need the templates as it uses default messages if it's empty

riqueandre commented 7 years ago

i'm building cachet from master branch @matejkramny

Mikagami commented 7 years ago

I was checking whether the incident would update after the issue has been resolved and I get the following error when it tries to update the incident. image

Using latest version of Cachet by the way.

matejkramny commented 7 years ago

Ok let me investigate & come back to you both

OzWookiee commented 7 years ago

I'm getting the same result:

time="2017-03-21T15:27:29+11:00" level=warning msg="Resolving incident" fields.time="21/03/2017 15:27:29 AEDT" monitor="Document Template PDF Rendering" time="21/03/2017 15:27:29 AEDT" 
time="2017-03-21T15:27:29+11:00" level=info msg="Error sending incident: Cannot parse incident body: unexpected end of JSON input, " fields.time="21/03/2017 15:27:29 AEDT" monitor="Document Template PDF Rendering" time="21/03/2017 15:27:29 AEDT" 

This is the version of cachet I'm using:

{
  "meta": {
    "on_latest": true,
    "latest": {
      "tag_name": "v2.3.10",
      "prelease": false,
      "draft": false
    }
  },
  "data": "2.4.0-dev"
}

This is the relevant error from the Cachet laravel log

[2017-03-21 05:37:29] production.ERROR: exception 'ErrorException' with message 'Missing argument 10 for CachetHQ\Cachet\Bus\Commands\Component\UpdateComponentCommand::__construct(), called in /var/www/Cachet/app/Bus/Handlers/Commands/Incident/UpdateIncidentCommandHandler.php on line 101 and defined' in /var/www/Cachet/app/Bus/Commands/Component/UpdateComponentCommand.php:121
karanp24 commented 6 years ago

hi guys, I am facing same issue as @HenriqueAndre . Do we have solution for this? Can somebody help ? I am using cachet 2.5.0-dev

MinThaMie commented 5 years ago

With the current2.4.0-dev version of Cachet is this no longer an issue

Pimorez commented 5 years ago

With the current2.4.0-dev version of Cachet is this no longer an issue

I wonder if I am doing something wrong in that case. The status is never changed from partial/major outage to something else. I do not see any attempts either, all I see coming by in the logs is WARN[0030] <name> is now saturated

kotarusv commented 3 years ago

I'm having the same issue. can someone help me? I tried to test using a very simple monitor but still getting the same

INFO[0010] monitor down 100.00%/80.00% monitor=google time="28/09/2020 01:14:27 GMT" WARN[0010] creating incident. Monitor is down: Expected HTTP response status: 400, got: 200 fields.time="28/09/2020 01:14:27 GMT" monitor=google time="28/09/2020 01:14:27 GMT" INFO[0010] Error sending incident: Cannot parse incident body: unexpected end of JSON input, fields.time="28/09/2020 01:14:27 GMT" monitor=google time="28/09/2020 01:14:27 GMT" DEBU[0010] Sending lag metric ID:4 RTT 65ms

cat sample.yaml


api: ~ date_format: "02/01/2006 15:04:05 MST" insecure: true monitors:

component_id: 1
expected_status_code: 400
interval: 1
method: GET
metric_id: 4
name: google
strict: true
target: "https://google.com"
template:
  fixed:
    subject: "I HAVE BEEN FIXED"
  investigating:
    message: |-
        {{ .Monitor.Name }} check **failed** (server time: {{ .now }})

        {{ .FailReason }}
    subject: "{{ .Monitor.Name }} - {{ .SystemName }}"
threshold: 80
timeout: 1