GoogleCloudPlatform / deploymentmanager-samples

Deployment Manager samples and templates.
Apache License 2.0
939 stars 718 forks source link

Stackdriver monitoring notification via deployment manager #506

Closed aniket-cloud closed 4 years ago

aniket-cloud commented 4 years ago

Hi Team,

I want to configure a stackdriver monitoring alert regarding sql-instance cpu usage. And it should be alerting in a slack channel when cpu usage will running above 75% from at least 5 min. Am able to create that notification via manually, but want to deploy that via only deployment manager tool.

Can you help me to share with any example regarding this deployment manager.

Thanks in advance. Aniket

ocsig commented 4 years ago

We don't have any example ready to go but I add it to the Cloud Foundation Toolkit backlog (432) to create one, including proper templates.

Until then, I help you to start, you need the following type: 'type': 'gcp-types/monitoring-v3:projects.notificationChannels' Creation and deletion will work, update needs to be tested. I will take a look at this later this week.

aniket-cloud commented 4 years ago

We don't have any example ready to go but I add it to the Cloud Foundation Toolkit backlog (432) to create one, including proper templates.

Until then, I help you to start, you need the following type: 'type': 'gcp-types/monitoring-v3:projects.notificationChannels' Creation and deletion will work, update needs to be tested. I will take a look at this later this week.

Thank you. I will try that in my end. If any issue found will let you know. :)

aniket-cloud commented 4 years ago

Hi,

I have tried to deploy alert for stackdriver monitoring policy on slack channel via deployment manager. But facing a error.

Can you share any reference regarding slack channel alert sending template from deployment manager.

Thanks in advance. Aniket

ocsig commented 4 years ago

I don't have such a template ready, but while we are working on it, would you be able to share your code, config and the error you get?

aniket-cloud commented 4 years ago

Hi,

Am sharing the yaml and jinja file. Please look into this and suggest where I need to make change.

alerting.yaml -

imports:
- path: alerting.jinja
resources:
- name: create_alertingpolicies
  type: alerting.jinja
  properties:
    notificationSlack:
    - slackChannel: #slackChannel-name
      displayName: "name"  
    policies:
    - name: "1 - Availability - Cloud SQL Database - Memory usage (filtered) [MAX]"   
      conditions: 
      - filter: "metric.type=\"cloudsql.googleapis.com/database/memory/usage\" resource.type=\"cloudsql_database\" resource.label.database_id=\"sql-instance-id\"" 
        comparison: "COMPARISON_GT"
        duration: "300s" 
        thresholdValue: 2750000000
        trigger:
          count: 1
        aggregations:
        - alignmentPeriod: "60s"
          perSeriesAligner: "ALIGN_RATE"
          crossSeriesReducer: "REDUCE_COUNT"

alerting.jinja -

{% set PREFIX = env["deployment"] %}
{% set NOTIFICATION_SLACK = properties["notificationSlack"] %}
{% set POLICIES = properties["policies"] %}
{% set PROJECT = env["project"] %}
{% set DEFAULT_MIME_TYPE = "text/markdown" %}

resources:
{% for slack in NOTIFICATION_SLACK %}
- name: {{ PREFIX }}-slack-{{ loop.index }}
  type: gcp-types/monitoring-v3:projects.notificationChannels
  properties:
    name: projects/{{ PROJECT }}
    type: slack
    displayName: {{ slack.displayName }}
    labels:
      slack_channel: {{ slack.slackChannel }}
    enabled: true
{% endfor %}

{% for policy in POLICIES %}
- name: {{ PREFIX }}-alertingpolicy-{{ loop.index }}
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    displayName: {{ PREFIX }}-{{ policy.name }}
    documentation:
      content: {{ policy.documentation.content }}
      mimeType:  {{ DEFAULT_MIME_TYPE }}
    combiner: OR
    conditions:
{% for condition in policy.conditions %}
    - displayName: {{ condition.displayName }}
      conditionThreshold:
        filter: {{ condition.filter }}
        comparison: {{condition.comparison }}
        duration: {{ condition.duration }}
        thresholdValue: {{ condition.thresholdValue }}
        trigger: {{ condition.trigger }}
        aggregations: {{ condition.aggregations }}
{% endfor  %}        
    notificationChannels:
{% for notification in NOTIFICATION_SLACK %}
    - $(ref.{{ PREFIX }}-slack-{{ loop.index }}.name)
{% endfor %}
    enabled: true
{% endfor %}

Thanks, Aniket

ocsig commented 4 years ago

What kind of error you are getting?

aniket-cloud commented 4 years ago

What kind of error you are getting?

image

ocsig commented 4 years ago

Your error message complains about line 26: content: {{ policy.documentation.content }}

If I understand correctly, you are looking through the properties["policies"] list. Your YAML config has 1 policy, which has no documentation item which you are referencing. Can you provide this value in your YAML?

Your code is failing on the Alert Policy, not at the notification channel.

aniket-cloud commented 4 years ago

Deployed as you say. Now the error became change.

alerting.yaml -

imports:
- path: alerting.jinja
resources:
- name: create_alertingpolicies
  type: alerting.jinja
  properties:
    notificationSlack:
    - slackChannel: #slack-channel
      displayName: "name"  
    policies:
    - name: "1 - Availability - Cloud SQL Database - Memory usage (filtered) [MAX]"   
      conditions: 
      - filter: "metric.type=\"cloudsql.googleapis.com/database/memory/usage\" resource.type=\"cloudsql_database\" resource.label.database_id=\"sql_instance_id\"" 
        comparison: "COMPARISON_GT"
        duration: "300s" 
        thresholdValue: 2750000000
        trigger:
          count: 1
        aggregations:
        - alignmentPeriod: "60s"
          perSeriesAligner: "ALIGN_RATE"
          crossSeriesReducer: "REDUCE_COUNT"
    documentation:
        content: "The janus rule ${condition.display_name} has generated this alert for the ${metric.display_name}."

image

ocsig commented 4 years ago

I managed to make it work with the following codes.

Some of the missing pieces were auth_token and displayName. I also had to change some values at aggregations. This is a sample code, please review every value and change it according to your needs.

alerting.yaml

imports:
- path: alerting.jinja
resources:
- name: create_alertingpolicies
  type: alerting.jinja
  properties:
    notificationSlack:
    - slackChannel: #slack-channel
      auth_token: "token-1234567890"
      displayName: "name"  
    policies:
    - name: "1 - Availability - Cloud SQL Database - Memory usage (filtered) [MAX]"   
      conditions: 
      - displayName: "CloudSQL Memory"
        filter: "metric.type=\"cloudsql.googleapis.com/database/memory/usage\" resource.type=\"cloudsql_database\" resource.label.database_id=\"sql_instance_id\"" 
        comparison: "COMPARISON_GT"
        duration: "300s" 
        thresholdValue: 2750000000
        trigger:
          count: 1
        aggregations:
        - alignmentPeriod: "60s"
          perSeriesAligner: "ALIGN_MAX"
          crossSeriesReducer: "REDUCE_MEAN"
      documentation:
        content: "The janus rule ${condition.display_name} has generated this alert for the ${metric.display_name}."

alerting.jinja

{% set PREFIX = env["deployment"] %}
{% set NOTIFICATION_SLACK = properties["notificationSlack"] %}
{% set POLICIES = properties["policies"] %}
{% set PROJECT = env["project"] %}
{% set DEFAULT_MIME_TYPE = "text/markdown" %}

resources:
{% for slack in NOTIFICATION_SLACK %}
- name: {{ PREFIX }}-slack-{{ loop.index }}
  type: gcp-types/monitoring-v3:projects.notificationChannels
  properties:
    name: projects/{{ PROJECT }}
    type: slack
    displayName: {{ slack.displayName }}
    labels:
      channel_name: {{ slack.slackChannel }}
      auth_token: {{ slack.auth_token }}
    enabled: true
{% endfor %}

{% for policy in POLICIES %}
- name: {{ PREFIX }}-alertingpolicy-{{ loop.index }}
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    displayName: {{ PREFIX }}-{{ policy.name }}
    documentation:
      content: {{ policy.documentation.content }}
      mimeType:  {{ DEFAULT_MIME_TYPE }}
    combiner: OR
    conditions:
{% for condition in policy.conditions %}
    - displayName: {{ condition.displayName }}
      conditionThreshold:
        filter: {{ condition.filter }}
        comparison: {{condition.comparison }}
        duration: {{ condition.duration }}
        thresholdValue: {{ condition.thresholdValue }}
        trigger: {{ condition.trigger }}
        aggregations: {{ condition.aggregations }}
{% endfor  %}        
    notificationChannels:
{% for notification in NOTIFICATION_SLACK %}
    - $(ref.{{ PREFIX }}-slack-{{ loop.index }}.name)
{% endfor %}
    enabled: true
{% endfor %}
aniket-cloud commented 4 years ago

I managed to make it work with the following codes.

Some of the missing pieces were auth_token and displayName. I also had to change some values at aggregations. This is a sample code, please review every value and change it according to your needs.

alerting.yaml

imports:
- path: alerting.jinja
resources:
- name: create_alertingpolicies
  type: alerting.jinja
  properties:
    notificationSlack:
    - slackChannel: #slack-channel
      auth_token: "token-1234567890"
      displayName: "name"  
    policies:
    - name: "1 - Availability - Cloud SQL Database - Memory usage (filtered) [MAX]"   
      conditions: 
      - displayName: "CloudSQL Memory"
        filter: "metric.type=\"cloudsql.googleapis.com/database/memory/usage\" resource.type=\"cloudsql_database\" resource.label.database_id=\"sql_instance_id\"" 
        comparison: "COMPARISON_GT"
        duration: "300s" 
        thresholdValue: 2750000000
        trigger:
          count: 1
        aggregations:
        - alignmentPeriod: "60s"
          perSeriesAligner: "ALIGN_MAX"
          crossSeriesReducer: "REDUCE_MEAN"
      documentation:
        content: "The janus rule ${condition.display_name} has generated this alert for the ${metric.display_name}."

alerting.jinja

{% set PREFIX = env["deployment"] %}
{% set NOTIFICATION_SLACK = properties["notificationSlack"] %}
{% set POLICIES = properties["policies"] %}
{% set PROJECT = env["project"] %}
{% set DEFAULT_MIME_TYPE = "text/markdown" %}

resources:
{% for slack in NOTIFICATION_SLACK %}
- name: {{ PREFIX }}-slack-{{ loop.index }}
  type: gcp-types/monitoring-v3:projects.notificationChannels
  properties:
    name: projects/{{ PROJECT }}
    type: slack
    displayName: {{ slack.displayName }}
    labels:
      channel_name: {{ slack.slackChannel }}
      auth_token: {{ slack.auth_token }}
    enabled: true
{% endfor %}

{% for policy in POLICIES %}
- name: {{ PREFIX }}-alertingpolicy-{{ loop.index }}
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    displayName: {{ PREFIX }}-{{ policy.name }}
    documentation:
      content: {{ policy.documentation.content }}
      mimeType:  {{ DEFAULT_MIME_TYPE }}
    combiner: OR
    conditions:
{% for condition in policy.conditions %}
    - displayName: {{ condition.displayName }}
      conditionThreshold:
        filter: {{ condition.filter }}
        comparison: {{condition.comparison }}
        duration: {{ condition.duration }}
        thresholdValue: {{ condition.thresholdValue }}
        trigger: {{ condition.trigger }}
        aggregations: {{ condition.aggregations }}
{% endfor  %}        
    notificationChannels:
{% for notification in NOTIFICATION_SLACK %}
    - $(ref.{{ PREFIX }}-slack-{{ loop.index }}.name)
{% endfor %}
    enabled: true
{% endfor %}

Many Thanks.

Sorry for the late reply. I made some changes on .yaml &. jinja file. Now its deployed & working as my requirement via deployment manager.

aniket-cloud commented 4 years ago

Just stuck in one thing. If I want to use "group by" tag in yaml file, then what how I use the parameter and where I need to mention this.

Thanks in advance. image

ocsig commented 4 years ago

Based on the API docs, aggregation has such a field.

imports:
- path: alerting.jinja
resources:
- name: create_alertingpolicies
  type: alerting.jinja
  properties:
    notificationSlack:
    - slackChannel: #slack-channel
      auth_token: "token-1234567890"
      displayName: "name"  
    policies:
    - name: "1 - Availability - Cloud SQL Database - Memory usage (filtered) [MAX]"   
      conditions: 
      - displayName: "CloudSQL Memory"
        filter: "metric.type=\"cloudsql.googleapis.com/database/memory/usage\" resource.type=\"cloudsql_database\" resource.label.database_id=\"sql_instance_id\"" 
        comparison: "COMPARISON_GT"
        duration: "300s" 
        thresholdValue: 2750000000
        trigger:
          count: 1
        aggregations:
        - alignmentPeriod: "60s"
          perSeriesAligner: "ALIGN_MAX"
          crossSeriesReducer: "REDUCE_MEAN"
             groupByFields:
             - Field1
             - Field2
      documentation:
        content: "The janus rule ${condition.display_name} has generated this alert for the ${metric.display_name}."