Yelp / sensu_handlers

Custom Sensu Handlers to support a multi-tenant environment, allowing checks themselves to emit the type of handler behavior they need in the event json
Apache License 2.0
75 stars 31 forks source link

include environment info is present in incident key #98

Closed cabecada closed 8 years ago

cabecada commented 8 years ago

Currently pagerduty incident key only includes region value in the incident key if defined. Including environment data in the incident key would help us parse the corresponding sensu server belonging to the corresponding environment. This will not break any existing setup but will only modify the incident key is environment attr is set in the sensu check definition.

https://v2.developer.pagerduty.com/docs/webhooks-overview

somic commented 8 years ago

Interesting. By environment do you mean things like 'production', 'stage', 'dev'? And what do you use for 'region' at your place? Do you use 'region' to mean cloud vendor region?

At Yelp both of these form what we call a 'region' - for us it's a combination of some notion where this is geographically + some notion of whether it's prod, stage, dev, etc. Our region is just one dimension along which we slice and dice our hosts for various purposes, and that's how it ended up here.

Is it possible for you to change what you pass as 'region' when you define your checks such that when pagerduty handler sees @event['check']['region'], it will include your environment there too?

solarkennedy commented 8 years ago

Yea. Or we can use the more generic term "environment" in this PR and nuke "region", and at yelp we can pass in whatever "environment" we want. (in our case, $::region)

cabecada commented 8 years ago

i was suggested if it makes sense to get env/region info from handler config/settings instead of event's check data. I found this more feasible, as it is directly associated with the server, less data transported over api :)

cabecada commented 8 years ago

@somic yes region was AWS regions and environment was prod/preprod/ci etc. and we had one region which belonged to both prod and preprod, hence the requirement.

fessyfoo commented 8 years ago

the more I think about this, I think the env/region info for the incident key really needs to come from handler config, or at least default to something in the handler config. Otherwise you have to make sure every check result describes that value correctly when it's really just a property of the sensu-server. So I'm reiterating @cabecada's comment above.

somic commented 8 years ago

How would you express it in the config?

cabecada commented 8 years ago

image

in pagerduty.pp
  $handler_config = delete_undef_values({
    teams => $teams,
    region => $::region,
    environment => $::environment,
  })
config => $handler_config,

# when running in vagrant
root@ubuntu1404-4:/etc/sensu# grep vagrant conf.d/handlers/pagerduty.json 
    "environment": "vagrant",
    "region": "vagrant"

and then invoking in handler files like
handler_settings['environment'] + '-' + handler_settings['region'] 
somic commented 8 years ago

Ah, I see. I thought you meant the string in incident_key method was going to be configured in handler settings.

solarkennedy commented 8 years ago

So.. are we in agreement? Use "environment" and allow it to be configured in the base handler configuration? (and at yelp we'll use $::region)

cabecada commented 8 years ago

Cool. I'll update this PR to include environment from default(or PagerDuty?) handler config. That is if environment is defined in handler config use it in incident key or fallback to default use case of region in event's check data

cabecada commented 8 years ago

@somic @solarkennedy I have replaced $region param with $environment param in init.pp and used $environment in pagerduty.pp for incident_key_identifier. Let me know what you think.

after puppet agent -t image

from sensu server logs, incident_key now has environment what we set above image

class definition in manifest image

@fessyfoo ^^

cabecada commented 8 years ago

@somic @solarkennedy any chance this will be merged?

somic commented 8 years ago

lgtm

cabecada commented 8 years ago

@solarkennedy @somic @fessyfoo does env_region in config works fine for you? This will also be beneficial for people using JIT checks or monitoring_check which does not allow region to be set in check definition :)

solarkennedy commented 8 years ago

I prefer "datacenter" to be consistent with the word that Uchiwa uses to display this data.

cabecada commented 8 years ago

So datacenter=environment_region, ex production_usw2 works fine? we need both environment info and region info in the incident key

cabecada commented 8 years ago

Ok so I can set datacenter as I want and use it in the incident key. Sorry I am in India and sometimes a lose a full round trip for a comment :( but if you all can agree on datacenter in the incident key from the handler configure, I'll update the PRtomorrow

cabecada commented 8 years ago

closing this and opening https://github.com/Yelp/sensu_handlers/pull/100