dj-wasabi / ansible-telegraf

Installing and configuring Telegraf via Ansible for RedHat/Debian/Ubuntu/Windows/Suse.
MIT License
134 stars 114 forks source link
ansible influxdb metrics molecule telegraf tsdb

dj-wasabi.telegraf

Build status:

Build Status

This role will install and configure telegraf.

Telegraf is an agent written in Go for collecting metrics from the system it's running on, or from other services, and writing them into InfluxDB.

Design goals are to have a minimal memory footprint with a plugin system so that developers in the community can easily add support for collecting metrics from well known services (like Hadoop, Postgres, or Redis) and third party APIs (like Mailchimp, AWS CloudWatch, or Google Analytics).

(https://github.com/influxdb/telegraf)

Requirements

Supported systems

This role supports the following systems:

So, you'll need one of those systems.. :-) Please sent Pull Requests or suggestions when you want to use this role for other systems.

InfluxDB

You'll need an InfluxDB instance running somewhere on your network. Or 1 of the other output types found on https://github.com/influxdata/telegraf/#output-plugins

Docker

Docker needs to be installed on the target host. I can recommend these roles to install Docker:

This is only the case when the configuration is needed for a Telegraf inside a Docker container (When telegraf_agent_docker: True).

Upgrade

0.7.0

There was an issue:

If I configure a telegraf_plugins_extra, run ansible, delete the plugin and run ansible again, the plugin stays on the machine.

Role Variables

Ansible role specific variables

Specifying the version to be installed:

How Telegraf needs to be installed. There are 4 methods in getting Telegraf installed on the target host:

This can be configured by setting telegraf_agent_package_method to one of the appropriate values ( repo, online, offline or manual).

Telegraf Package

These properties set in how and what package will be installed.

Telegraf agent process configuration.

Docker specific role variables:

Full agent settings reference: https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md#agent-configuration.

Extra information

ansible_fqdn problematic for getting hostname

Extra info regarding: ansible_fqdn problematic for getting hostname #105

Describe the bug

In some nodes I'm getting weird hostnames, mostly localhost.localdomain. Those nodes show proper configuration in hostnamectl. I've seen you're using 'ansible_fqdn' as default.

Seems like ansible_fqdn and ansible_hostname can give different results, and sometimes even very weird results, as it sometimes makes DNS calls (which is not under my control in that cases) to infer that names.

Fix proposal

In my playbook I've added this parameter:

telegraf_agent_hostname: "{{ ansible_nodename }}"

Setting tags

You can set tags for the host running telegraf:

telegraf_global_tags:
  - tag_name: some_name
    tag_value: some_value

Specifying an output. The default is set to localhost, you'll have to specify the correct influxdb server:

telegraf_agent_output:
  - type: influxdb
    config:
      - urls = ["http://localhost:8086"]
      - database = "telegraf"
    tagpass:
      - cpu = ["cpu0"]

The config will be printed line by line into the configuration, so you could also use:

config:
    - # Print an documentation line

and it will be printed in the configuration file.

Docker specifics

Docker image

The official Influxdata Telegraf image is used. telegraf_agent_version will translate to the image tag.

Docker mounts

Please note that the Docker container bind mounts basicly your whole system (read-only) to monitor the Docker Engine Host from within the container. To be precise:

- /etc/telegraf:/etc/telegraf:ro
- /:/hostfs:ro
- /etc:/hostfs/etc:ro
- /proc:/hostfs/proc:ro
- /sys:/hostfs/sys:ro
- /var/run:/var/run:ro

More information: https://github.com/influxdata/telegraf/blob/master/docs/FAQ.md.

Example Docker configuration

telegraf_agent_docker: True
# Force host networking mode, so Docker Engine Host traffic metrics can be gathered.
telegraf_agent_docker_network_mode: host
# Force a specific image tag.
telegraf_agent_version: 1.10.0-alpine

telegraf_plugins_default:
  - plugin: cpu
    config:
      - percpu = true
  - plugin: disk
    tagpass:
      - fstype = [ "ext4", "xfs" ]
    tagdrop:
      - path = [ "/etc", "/etc/telegraf", "/etc/hostname", "/etc/hosts", "/etc/resolv.conf" ]
  - plugin: io
  - plugin: mem
  - plugin: system
  - plugin: swap
  - plugin: netstat
  - plugin: processes
  - plugin: docker
    config:
      - endpoint = "unix:///var/run/docker.sock"
      - timeout = "5s"

Windows specific Variables

NOTE

Supporting Windows is an best effort (I don't have the possibility to either test/verify changes on the various amount of available Windows instances). PR's specific to Windows will almost immediately be merged, unless some one is able to provide a Windows test mechanism via Travis or other service for Pull Requests.

openSUSE specific Variables

MacOS specific Variables

NOTE

MacOS support is as the Window Support an best effort and not officially supported.

Extra information

There are two properties which are similar, but are used differently. Those are:

telegraf_plugins_default

With the property telegraf_plugins_default it is set to use the default set of Telegraf plugins. You could override it with more plugins, which should be enabled at default.

telegraf_plugins_default:
  - plugin: cpu
    config:
      - percpu = true
  - plugin: disk
  - plugin: io
  - plugin: mem
  - plugin: system
  - plugin: swap
  - plugin: netstat

Every telegraf agent has these as a default configuration.

telegraf_plugins_extra

The 2nd parameter telegraf_plugins_extra can be used to add plugins specific to the servers goal. It is a hash instead of a list, so that you can merge values from multiple var files together. Following is an example for using this parameter for MySQL database servers:

cat group_vars/mysql_database
telegraf_plugins_extra:
  mysql:
    config:
      - servers = ["root:{{ mysql_root_password }}@tcp(localhost:3306)/"]

There is an option to delete extra-plugin files in /etc/telegraf/telegraf.d if they weren't generated by this playbook with telegraf_plugins_extra_exclusive variable.

Telegraf plugin options:

An example might look like this:

telegraf_plugins_default:
  - plugin: disk
    interval: 12
    tags:
      - diskmetrics = "true"
    tagpass:
      - fstype = [ "ext4", "xfs" ]
      - path = [ "/opt", "/home" ]

If you want to define processors you can simply use telegraf_processors variable. An example might look like this:

telegraf_processors:
  - processor: rename
  - processor: rename.replace
    config:
        - tag = "level"
        - dest = "LogLevel"

When you want to make use of the grok filter for the logparser:

telegraf_plugins_extra:
    logparser:
    plugin: logparser
    config:
        - files = ["/var/log/messages"]
        - from_beginning = false
    filter:
        name: grok
        config:
        - patterns = ["invoked oom-killer"]

When you want to include a sub inputs with their own configuration:

sqs:
  plugin: cloudwatch
  config:
    - region = "eu-west-1"
    - access_key = "foo"
    - secret_key = "bar"
    - period = "1m"
    - delay  = "2m"
    - interval = "1m"
    - namespace = "AWS/SQS"
    - statistic_include = ["average"]
  sub_inputs:
    metrics:
      - names = [
          "ApproximateAgeOfOldestMessage",
          "ApproximateNumberOfMessagesVisible",
        ]
    metrics.dimensions:
      - name = "QueueName"
      - value = "*"

Dependencies

No dependencies

Example Playbook

- hosts: servers
  roles:
     - { role: dj-wasabi.telegraf }

Molecule

This roles is configured to be tested with Molecule. You can find on this page some more information regarding Molecule: https://werner-dijkerman.nl/2016/07/10/testing-ansible-roles-with-molecule-testinfra-and-docker/

License

BSD

Author Information

Please let me know if you have issues. Pull requests are also accepted! :-)

mail: ikben [ at ] werner-dijkerman . nl