BCDevOps / platform-services

Collection of platform related tools and configurations
Apache License 2.0
13 stars 29 forks source link

Logging / Alerts enhancements for teams #55

Open stewartshea opened 5 years ago

stewartshea commented 5 years ago

Look into various team alerting options

stewartshea commented 5 years ago

Quick research notes:

stewartshea commented 5 years ago

Making progress with Grafana tapping into Elasticsearch via oauth2. More updates to come

stewartshea commented 5 years ago

Getting blocked on alerting in grafana now;

firing:true
state:"pending"
conditionEvals:" = true"
timeMs:"11.949ms"
error:"tsdb.HandleRequest() error invalid character 'A' looking for beginning of value"

Some research indicates it may be due to the authenticated request.

stewartshea commented 5 years ago

Alerts appear to fire with TLS auth (tested using admin certs), but NOT with token auth (which is restricted to certain projects).

stewartshea commented 5 years ago

Just a quick note.. the "Slack" notification channel can be used for rocketchat :)

https://github.com/grafana/grafana/issues/9251

stewartshea commented 5 years ago

Also posted the issue here: https://github.com/grafana/grafana/issues/15381

stewartshea commented 5 years ago

Storage requirements: up to 2 years - need to factor this in

stewartshea commented 5 years ago

The storage requirement makes the original idea not work so well.... I'm thinking about 2 options:

  1. Grafana tapped into EFK for short term retention
  2. A dedicated ELK/EFK stack for longer term retention. I'd prefer Grafana + Loki but need to see how far out Loki is from becoming something maintainable. It is currently in Alpha
stewartshea commented 5 years ago

Just blocked on time / priority at the moment

stewartshea commented 5 years ago

Additional use-case feedback:

more or less what we need to do - log business data and able to get some dashboards
data looks like the following
{"_type":"ticketDispute","_id":"EA200008161","eventType":"ticketDispute","eventID":"EA200008161","eventStatus":"RECEIVED","eventTime":"2019/05/02 15:28:51.660"}
the issue is - I need somehow to mod log parser to get it from openshift log format
it goes in as a part of message structure
since OC wraps every line in the log into they own json doc
so ELK has it as OC doc
stewartshea commented 5 years ago

Local logging alerts with Loki is still not ready... loki is not yet supported on the dashboard datasource side which is required in grafana to create alerts... awaiting updates if this is to be used. This capability is still interesting, but would replace local fluentd logs and has overhead concerns if we are using local file logging with redirect to loki.