DataDog / puppet-datadog-agent

Puppet module to install the Datadog agent
Other
52 stars 261 forks source link

revamp of integrations #147

Closed kitchen closed 3 years ago

kitchen commented 8 years ago

TL;DR: revamp integrations to allow multiple instances and standard parameter validation on all integrations.

Hey folks! I'm looking for some feedback on the current state of integrations vs where you'd like to go with things.

I see in the history there is at least one commit that references "integrations are classes now, not defines", and I can understand why that is to a degree. However, I don't think that hard "this is the only way it can be" model makes a lot of sense, and I think there is a better approach that can satisfy everyone's needs.

current state

Because of the way the configuration files for integrations are, the 'classes only' model makes a lot of sense. There is one section init_config that is common to the file itself, and then there are multiple instances that can be associated, like requested for in #130. This maps decently to a class whose parameters cover the init_config section, and then an array of instances as one of the class parameters. Great.

However there are some significant downsides to this model:

  1. Since we're using class params, it's hard for a class to say "hey, I need one of these too" like you might with a type. For instance, with the role/profile pattern, if you have 2 postgres databases in different profiles and want to monitor them both, you can't simply declare the class on both, because now those classes can't exist within the same catalog. Now you have to do something to factor out the class into another class which can be included in both profiles, but that's not always the easiest thing to do. Also, consider the case where you want to separate these profiles onto different machines, now you have to un-factor this, and ugh.
  2. Semantically, this is actually not the right way to do it with puppet. These are instances, they should be a type so you can declare multiple of them.
  3. Currently, the module implements this pattern very sporadically. Some have it (nginx, mongo, zk), while most do not. There does appear to be demand for multiple instances from people other than me (see #130, #64, #56), and I know we (Stripe) will be needing it shortly as we start setting up datadog monitoring for our RDS instances (among other things).
  4. Deep validation of the hashes and arrays used to declare the instances is difficult, and as a result, almost completely missing.

The problem of course is that this is all modifying one file, and that file can only be declared once. And part of the file is "class-like" and part of the file is "type-like". IMO the best way to resolve this would actually be to change the configuration parser to allow multiple files to configure the same plugin, and maybe each config file supports an integration parameter which specifies which integration the configs are for. This model is used very well with collectd and the puppet/collectd module (disclaimer: I have made many contributions to that module).

However, that's a fairly major change which depends on major changes in the datadog agent itself, which is a rather large dependency for this :)

new hotness

I'd like to propose a different model.

There would be 2 types that define the underlying file:

These two would be used to define the init_config section of an integration config file, and the instances section. They would expose a pretty generic API which would pretty much just take some 'standard' parts (like tags, on the instance) and then a hash of other parameters to simply YAML.dump into the config file.

With these we could build types and classes to define specific integrations.

As an example, for the mongo integration it would look something like this:

class datadog_agent::integrations::mongo::init_config (params) {
    # validate params
    datadog_agent::integration::init_config { 'mongo':
        params => 'here',
    }
}
define datadog_agent::integrations::mongo (params) {
    include datadog_agent::integrations::mongo::init_config
    # validate params
    datadog_agent::integration::instance { "mongo-${name}":
        integration => 'mongo',
        params => 'here',
        tags => [ 'tags', 'go', 'here' ],
    }
}

Then using the integration would be as simple as:

datadog::agent::integration::mongo { 'localhost':
    params => 'here',
    tags => 'here',
}

And then you get instances for free. If you need to specify additional parameters to the init_config section, you can declare it separately yourself, or specify them via hiera.

As far as having 'generic' integrations that people can use (maybe their own integrations they've written, maybe an integration provided by the datadog agent itself which doesn't have puppet support yet), they can use those 2 building blocks in their own classes/defines.

Some of the integrations don't make sense to have "instances" of (disk, agent_metrics) even though they use the instances section. These can just be classes:

class datadog_agent::integrations::disk (params) {
    datadog::agent::integration::init_config { 'disk':
        params => 'here',
    }
    datadog::agent::integration::instance { 'disk':
        params => 'here',
        tags => ['whatever'],
    }
 }

The underlying implementation of the two core types could use puppetlabs/concat or richardc/datacat to actually build the files. The nice thing about datacat is that since we're generating a yaml file it's a nice mapping to build up a hash and YAML.dump in the template and call it good, but I've always thought datacat, while great, was the wrong approach to the problem. Concat is nice because it's a puppetlabs supported module, but building a yaml data structure out of file fragments also has always been a bad smell for me. However, Because Puppet™, these are basically the best solutions we have to this problems.

As far as supporting every-feature-under-the-sun of the integrations, I believe if we make the underlying types (the init_config, and instance types) generic enough, then arbitrary integrations are possible. And for the 'official' integrations, assuming they are defined well enough in the python code, we could probably generate the puppet module based on those (instance/init_config parameters, etc). This is something which may require a fair amount of work on the python code (adding metadata to each integration type), but could make supporting the puppet module a much easier prospect for you folks in general.

Anywho, I'd like to get cracking on these changes as soon as possible, and I'd love to have some feedback if there are any concerns y'all have or changes you'd like to request. I can also do a proof of concept to show what I'm talking about if you'd like before I get cracking on changing all of the current integrations.

Thanks, and I look forward to your feedback!

kitchen commented 8 years ago

cc @gphat

StephenWeber commented 8 years ago

I agree with the general design, agree that this is a good solution while waiting for a more comprehensive solution out of the dd-agent itself, and really appreciate how it will simplify the actual file-creation portion of writing integrations. This also retains more power over the defaults and structure of the YAML file to help avoid issues like arbitrary arrangement of attributes would lead to regular agent restarts. I think the puppetlabs/concat approach has been mentioned by other contributors before (but not acted upon).

I can test with some of my systems, lmk what integrations might be helpful.

kitchen commented 8 years ago

Hey, thanks @StephenWeber for the feedback. I have started work on the changes I proposed above over here https://github.com/kitchen/puppet-datadog-agent/tree/kitchen/datacat-generation

I went with the richardc/datacat module because something about trying to generate a valid yaml file out of fragments of text just makes me cringe ;-)

I'm going to work on porting a couple of the integrations over (one that previously had "instances", one that didn't), for proof of concept and would like to get some beaker tests written, though I have not yet written any beaker tests before, so that's going to be a learning experience for me :)

kitchen commented 8 years ago

just as a followup, I'm moving the development of this off of my personal github and over to stripe's github: https://github.com/stripe/puppet-datadog-agent/tree/kitchen/datacat-generation

and I have a couple of PRs open against that branch to demonstrate usage:

irabinovitch commented 8 years ago

@kitchen Apology for the delay in response. Per our email thread we're going to setup a call and then document feedback here in the ticket as well.

irabinovitch commented 8 years ago

@kitchen Just wanted to follow up on our previous discussions, we love the general direction you have in mind and would be happy to work with you on it. Let us know if/how we can help!

kitchen commented 8 years ago

ohai! Sorry for delay and thanks for the reply! I've got some time set aside next week to polish this up for a first stab. I was hoping to have some integration testing and stuff but I haven't managed to get beaker figured out yet, so in the interests of not having this blocked by that, I'll pass it along for feedback!

irabinovitch commented 8 years ago

Sounds good. Thanks!

irabinovitch commented 8 years ago

@kitchen Friendly ping. Any chance you can send in a PR with your WIP commit?

rabidscorpio commented 8 years ago

@irabinovitch Can we get a more official solution to this? This has been an ongoing issue for more than a year now. We've had to manually edit the forge module which makes it very painful to upgrade.

The datadog agent has been able to support multiple instances in many different integrations for a long time.

nambrosch commented 7 years ago

i was also hoping to use this puppet module across our hosts but can't without multiple instance support for integrations like tomcat since many instances could be running on a single server. i like the design laid out by @kitchen - it conforms to the class structure i'm used to working with.

irabinovitch commented 7 years ago

We agree that the proposal is the right direction, and hope to incorporate something like this in an upcoming major release. Unfortunately I'm not able to share a timeline at the moment, but PRs are welcome.

cabrinha commented 5 years ago

Any updates avail? What is proposed by OP (kitchen) is a pretty good solution.

cdloh commented 3 years ago

Has there been a decision not to do this? It'd simplify things a lot for things like Database monitoring where multiple classes create databases and some want to add monitoring etc.

albertvaka commented 3 years ago

@cdloh This can already be done, using the following syntax (eg for the ntp integration):

class { 'datadog_agent':
    api_key      => "<YOUR_DD_API_KEY>",
    integrations => {
        "ntp" => {
            init_config => {},
            instances => [{
                offset_threshold => 30,
            }],
        },
    },
}

Note that (following the ntp example), the ntp class still exists, but as you mentioned some of the dedicated integration classes won't allow you to create multiple instances and/or to specify certain parameters, while with this generic syntax you can pass whatever arguments you want (also note instances is an array, which should cover your use case).

I hope this helps :)

cdloh commented 3 years ago

@albertvaka perhaps I'm not being super clear.

Within our Puppet setup we've got various modules that control and configure the system.

Some of these modules create postgres databases at run time. They are created as resource types in Puppet so we aren't aware at the start how many or what they could be called.

Each module can't add an instance to datadog as the postgres integration can only be loaded once, nor can the class.

I'm trying to figure out a way so that each module can add monitoring for it's own database if required. If it was a template that could be concat it'd be easy enough hence my question here. If there is another way around it that I'm missing I'm all ears.

albertvaka commented 3 years ago

Ok, I think I got the problem right this time :) In the place in the code where you define the postgres check, you still don't know how many postgres databases you will end up configuring. So you want the code that configures each database to also add an instance to the postgres check from that same place.

I've tried to implement a solution for your use case in this PR: #677

Can you give it a look and tell me if it covers your use case?

cdloh commented 3 years ago

Thanks for the super fast reply! I'll leave some comments on there.