canarytrace / documentation

Plug’n'Play stack for testing and monitoring web applications from user perspective.
http://canarytrace.com
5 stars 0 forks source link

Rules, Internal Checks, Scores & Reporters #110

Closed rdpanek closed 1 year ago

rdpanek commented 2 years ago

How it works

After starting the listener-agent, several processes are started one by one. Selftests so-called internal checks next step is evaluate stored data in a Elasticsearch from Canarytrace (synthetics) and RUM so-called rules and the last process are autonomous reports so-called overviews. This is a lifecycle each run of the listener-agent.

First process are an internal checks, which are helpful test health for the Canarytrace toolset and Elasticsearch. Bad results and warnings are reported by their score into via a different reporters.

Elasticsearch indices The listener-agent create and use these indices:

Internal checks

First process are an internal checks, which are helpful test health for the Canarytrace toolset and Elasticsearch. Bad results and warnings are reported by their score into a different reporters.

Elasticsearch

Canarytrace

Data Manager

Rules

The Listener agent contains three types of the rules in a files:

Overviews

Overviews are autonomous reports. The listener-agent autonomously evaluates the severity of problems and sends a reminder report or overview with more informations.

Events

The listener-agent save results from all processes into index c.listener.events-*. Each item in this index contains additional informations.

Filtering You can filtering events by label rule e.g. internal-check, close-old-errors, contains, match and range or by label type e.g. slack-internal or events

Reporters

Important, how reporters works 1). Report is send immediately only in case, that score is lower than 31 and that this type of rule exceeded isn't in c.listener.queue 2). All rule exceeded are automatically stored into c.listener.events-* and into c.listener.queue-* Elasticsearch indices regardless of value the score. 3). If rule exceeded again, report isn't send but label of repetition exceeded rule stored in c.listener.queue-* is updated.

Types Results of the listener-agent processes are reported via so-called reporters. In this time are available these types:

Reporters syntax

reporters:
    - type: slack
       message: Page is complete loaded."
    - type: slack-internal
       message: Page is complete loaded."
    - type: email
       message: "An error occurred while checking."
       recipients:
       - 'rdpanek@canarytrace.com'
       - 'verca@canarytrace.com'

Rule analyzator

Rule analyzator is process which each of rule transform into Elasticsearch query, send query into Elasticsearch and analyze response by rule.

rdpanek commented 1 year ago

Score table

Canarytrace toolkit use this score tables

Description Score Color
needs fix! 0-30 red
needs improvement! 31-70 orange
good job! 71-100 green
rdpanek commented 1 year ago

Internal Rules

last rev. quay.io/canarytrace/listener-agent:1.41 source: rules/internal.yaml

Title Index Condition Count / hour Score
Load time needs improvement c.performance-entries > 3000ms 5 40
Load time is poor c.performance-entries > 5000ms 5 20
Higher response time. c.performance-entries > 3000ms 10 40
Performance Score is poor c.audit 0 - 49 3 20
Performance Score needs improvement c.audit 50 - 89 3 60
FCP needs improvement c.audit 1800 - 3000 3 40
FCP is poor c.audit > 3000 3 20
LCP needs improvement (CoreWebVitals). c.audit > 2500ms 5 40
LCP is poor (CoreWebVitals). c.audit > 4000ms 5 40
CLS needs improvement (CoreWebVitals) c.audit > 0.1 5 40
CLS is poor (CoreWebVitals) c.audit > 0.25 5 30
TBT needs improvement c.audit 200 - 600 3 40
TBT is poor c.audit > 600 3 20
TTI needs improvement c.audit > 3900ms 5 40
TTI is poor c.audit > 7300ms 5 20
Response with Javascript must contains gzip or brotli compression. c.request-log gzip or br missing in headers.content-encoding 10 40
Responses with CSS files must contains gzip or brotli compression. c.request-log gzip or br missing in headers.content-encoding 10 40
Response code 400 r.request-log >= 400 5 80
Response code 500 r.request-log >= 500 5 20
Failed check your page! c.report test step failed 2 10
rdpanek commented 1 year ago

RUM Rules

last rev. quay.io/canarytrace/listener-agent:1.41 source: rules/rum.yaml

Title Index Condition Count / hour Score
LCP needs improvement (RUM) c.rum.metrics 2500ms - 4000ms 2 40
LCP is poor (RUM) c.rum.metrics >= 4000ms 2 20
CLS needs improvement (RUM) c.rum.metrics 0.1 - 0.25 5 40
CLS is poor (RUM) c.rum.metrics >= 0.25 5 30
FID needs improvement (RUM) c.rum.metrics 100 - 300 2 20
FID is poor (RUM) c.rum.metrics >= 300 5 30
FCP needs improvement (RUM) c.rum.metrics 1800 - 3000 3 40
FCP is poor (RUM) c.rum.metrics >= 3000 3 20
TTFB needs improvement (RUM) c.rum.metrics 400 - 800 3 40
TTFB is poor (RUM) c.rum.metrics >= 800 3 20
rdpanek commented 1 year ago

Custom Rules

last rev. quay.io/canarytrace/listener-agent:1.41 source: rules/custom.yaml

This file contains only example rule.

rdpanek commented 1 year ago

Internal checks

Reporters internal-slack Internal checks are run when Listener agent start and before rules check.

Title Score
Elasticsearch health check: yellow 50
Elasticsearch health check: red 0
Elasticsearch nodes: java heap is higher than 80% 50
Elasticsearch nodes: java heap is higher than 85% 0
Elasticsearch nodes: disk used percentage is higher than 85% 30
Canarytrace health check: missing data 0
Canarytrace lifecycle errors 30
DataManager exist: Not found. 31
rdpanek commented 1 year ago

Rules syntax

Example of one rule definition

  - type: range
    title: "Load time needs improvement"
    index: c.performance-entries
    filter:
    - field: 'loadEventEnd'
      operator: 'gte'
      value: 3000
    - field: 'loadEventEnd'
      operator: 'lt'
      value: 5000
    min: 5
    score: 40
    reportLabels:
    - 'loadEventEnd'
    - 'timestamp'
    reporters:
    - type: slack
      message: "Page is complete loaded."
rdpanek commented 1 year ago

rule type range

Example rule

This rule evaluate if exists minimal 5 items in c.performance-entries index and if these items has label loadEventEnd with value between 3000 and 5000ms

rules:
  - type: range
    title: "Load time needs improvement"
    index: c.performance-entries
    filter:
    - field: 'loadEventEnd'
      operator: 'gte'
      value: 3000
    - field: 'loadEventEnd'
      operator: 'lt'
      value: 5000
    min: 5
    score: 40
    reportLabels:
    - 'loadEventEnd'
    - 'timestamp'
    reporters:
    - type: slack
      message: "Page is complete loaded."

Mandatory parts

Optional parts

rdpanek commented 1 year ago

rule type contains

Example rule

This rule evaluate if exists minimal 10 items in c.request-log index and if these items has field response.headers.content-type with value css plus field response.headers.content-encoding must not have values gzip and br

rules:
  - type: contains
    title: "Responses with CSS files must contains gzip or brotli compression."
    index: c.request-log
    field: response.headers.content-type
    value: 'css'
    expression:
      field: 'response.headers.content-encoding'
      operator: must_not
      values:
      - 'gzip'
      - 'br'
    min: 10
    score: 40
    reportLabels:
    - 'url'
    - 'timestamp'
    reporters:
    - type: slack
      message: "Use Brotli for plain text compression."

Mandatory parts

Optional parts

rdpanek commented 1 year ago

rule type match

Example rule

This rule evaluate if exists minimal 2 items in c.report-* index and if these items has field passed with value false.

rules:
  - type: match
    title: "Failed check your page!"
    index: c.report
    field: passed
    operator: must
    expected: false
    min: 2
    score: 10
    reportLabels:
    - 'fullTitle'
    - 'timestamp'
    reporters:
    - type: slack
      message: "An error occurred while checking."

Mandatory parts

Optional parts