elastic / uptime

This project includes resources and general issue tracking for the Elastic Uptime solution
12 stars 3 forks source link

Improve `error.type` in Heartbeat / Uptime UI #183

Open andrewvc opened 5 years ago

andrewvc commented 5 years ago

We'd like to be able to collate errors in Uptime, show discrete errors, and allow users to see things like which errors are most common. The problem is we have two fields message and type for errors.

Message is too specific, it's indexed as full text and has things like IPs interpolated into it.

Type is mostly too coarse. It only has two values today io and validate. The former for all network issues, the latter for anything related to user validation functions. We should probably refine these types.

We should ask whether changing the values in these fields would be a breaking change. ECS says type should general match to error classes. So, we're much to coarse there today. There's also code, but that seems like it'd be more specific, like an HTTP status code.

Taking it all together, I don't think users would find it problematic if we switched to a sort of prefix-delineated style, io.tcp.could-not-connect for example, or validate.string-did-not-match. One could still bin errors using ES filter style queries with prefixes efficiently, and we could add a lot more detail.

I do worry that this may be complex with the way go error handling works, hopefully we won't have to resort to string parsing to accomplish this, but we don't need to do this all at once. Rather, we can find the most common error types and try to refine those.

A good start might be:

  1. io.ip.no-route
  2. io.dns.nxdomain
  3. io.icmp.ttl-expired
  4. io.tcp.could-not-connect
  5. io.http.bad-response-code
  6. io.tls.cert-expired
  7. io.tls.cert-mismatch
  8. validate.http-body.string-mismatch

The general format here is:

  1. io.[network-layer].specific-error
  2. validate.[validator-name].[validator-error]
elasticmachine commented 5 years ago

Pinging @elastic/uptime