dpeters / puppet-opsview

Puppet types/providers to support Opsview resources
12 stars 10 forks source link

opsview_monitored: hosttemplates change on every puppet run, causing Opsview to always reload #1

Closed antaflos closed 12 years ago

antaflos commented 12 years ago

I have posted this issue over at cparedes's repo and he directed me here, so here it is again :)

Using Puppet 2.7.10 from apt.puppetlabs.com on Ubuntu 10.04.

The following message appears during every puppet run on any node using the opsview_monitored type:

notice: /Stage[main]/Opsview::Puppet::Config/Opsview_monitored[app01]/hosttemplates: hosttemplates changed ['Application - Opsview Client', 'Network - Base', 'OS - Unix Base'] to 'Application - Opsview Client Network - Base OS - Unix Base'

The relevant part of the manifest:

class opsview::puppet::config {  

  ...

  opsview_monitored { $::hostname:
    ensure         => 'present',
    ip             => $::fqdn,
    require        => [ Class['opsview::puppet::install'], Opsview_hostgroup[$opsview::puppet::hostgroup] ],
    hostgroup      => $opsview::puppet::hostgroup,
    hosttemplates  => ['Application - Opsview Client', 'Network - Base', 'OS - Unix Base',],
    reload_opsview => true,
  }

  ...  

This causes Opsview to reload on every puppet run of every node. Obviously this is not practical. Not only is every puppet node markes as "active", there also arise conflicts when two nodes have a simultaneous puppet run. Then Opsview responds with "409 Conflict", because a reload is already in progress. This leads to an error on the node that lost the race for the Opsview reload.

I am not sure why this is happening, but apparently it has something to do with the way arrays are handled between Puppet and JSON? I'd really like to get to the bottom of this but I don't know enough about Ruby or Puppet internals to make any difference. If I can provide any more information please tell me.

Also, thank you (both dpeters and cparedes) for your work on this module. Though it lacks some documentation we had almost no problem getting the basic integration up and running.

dpeters commented 12 years ago

The behavior your experiencing is due to a bug in puppet 2.7.10: http://projects.puppetlabs.com/issues/12197

I ran into this same thing when upgrading to 2.7.10. If you roll back to 2.7.9, or wait for 2.7.11, then this behavior should go away.

Glad to hear you were able to use this, and thanks for the feedback. I'll see about adding some more documentation.

antaflos commented 12 years ago

Devon, thanks for the update and info! I don't know why it didn't occur to me that this could be a bug in Puppet itself. Kinda obvious, now :)

yasn77 commented 11 years ago

I am still having this problem. I have tried Puppet 2.7.9 and Puppet 2.7.19.

On a large estate, it makes this module unusable :(

I am currently running Puppet 2.7.19 o RHEL 5.8 using ruby 1.8.7

Any ideas?

dpeters commented 11 years ago

Strange. It worked fine on 2.7.9 for me, and I recall it working on 2.7.11 as well. Unfortunately, I don't have access to an opsview system anymore, so testing this might be somewhat difficult.

It might be the same issue, or it might be due to something in opsview that isn't getting synced properly by the module.

To check, could you send me a puppet resource declaration for something that has this problem, and also grab the JSON for it from opsview? You can get the JSON by running the following command as the nagios user:

opsview_rest --username=<user> --password=<pass> --data-format=json --pretty GET "config/<resource>/?s.name=<resourcename>"
yasn77 commented 11 years ago

The resource declaration:

                $opsview_servicechecks = ["Puppet status", "SSH", "Disk", "Unix Memory", "Unix Load Average"]
                $opsview_hostgroup = "ART Testing - Linux"
                opsview_monitored { $hostname:
                        reload_opsview => True,
                        ensure => present,
                        ip => $ip,
                        hostgroup => $opsview_hostgroup,
                        servicechecks => $opsview_servicechecks,
                        provider => opsview,
              }

The output of opsview_rest:

{
   "list" : [
      {
         "alias" : "Puppet Managed Host",
         "check_attempts" : "2",
         "check_command" : {
            "name" : "ping",
            "ref" : "/rest/config/hostcheckcommand/15"
         },
         "check_interval" : "0",
         "check_period" : {
            "name" : "24x7",
            "ref" : "/rest/config/timeperiod/1"
         },
         "enable_snmp" : "0",
         "flap_detection_enabled" : "1",
         "hostattributes" : [],
         "hostgroup" : {
            "name" : "ART Testing - Linux",
            "ref" : "/rest/config/hostgroup/11"
         },
         "hosttemplates" : [],
         "icon" : {
            "name" : "LOGO - Opsview",
            "path" : "/images/logos/opsview_small.png"
         },
         "id" : "48",
         "ip" : "arttst-yas002",
         "keywords" : [],
         "monitored_by" : {
            "name" : "Master Monitoring Server",
            "ref" : "/rest/config/monitoringserver/1"
         },
         "name" : "arttst-yas002",
         "nmis_node_type" : "router",
         "notification_interval" : "60",
         "notification_options" : "u,d,r",
         "notification_period" : {
            "name" : "24x7",
            "ref" : "/rest/config/timeperiod/1"
         },
         "other_addresses" : "",
         "parents" : [],
         "rancid_connection_type" : "ssh",
         "rancid_password" : null,
         "rancid_username" : null,
         "rancid_vendor" : null,
         "ref" : "/rest/config/host/48",
         "retry_check_interval" : "1",
         "servicechecks" : [
            {
               "event_handler" : null,
               "exception" : null,
               "name" : "Disk",
               "ref" : "/rest/config/servicecheck/47",
               "remove_servicecheck" : "0",
               "timed_exception" : null
            },
            {
               "event_handler" : null,
               "exception" : null,
               "name" : "Puppet status",
               "ref" : "/rest/config/servicecheck/118",
               "remove_servicecheck" : "0",
               "timed_exception" : null
            },
            {
               "event_handler" : null,
               "exception" : null,
               "name" : "SSH",
               "ref" : "/rest/config/servicecheck/22",
               "remove_servicecheck" : "0",
               "timed_exception" : null
            },
            {
               "event_handler" : null,
               "exception" : null,
               "name" : "Unix Load Average",
               "ref" : "/rest/config/servicecheck/45",
               "remove_servicecheck" : "0",
               "timed_exception" : null
            },
            {
               "event_handler" : null,
               "exception" : null,
               "name" : "Unix Memory",
               "ref" : "/rest/config/servicecheck/44",
               "remove_servicecheck" : "0",
               "timed_exception" : null
            }
         ],
         "snmp_community" : "public",
         "snmp_extended_throughput_data" : "0",
         "snmp_max_msg_size" : "0",
         "snmp_port" : "161",
         "snmp_version" : "2c",
         "snmpv3_authpassword" : "",
         "snmpv3_authprotocol" : null,
         "snmpv3_privpassword" : "",
         "snmpv3_privprotocol" : null,
         "snmpv3_username" : "",
         "tidy_ifdescr_level" : "0",
         "uncommitted" : "0",
         "use_mrtg" : "0",
         "use_nmis" : "0",
         "use_rancid" : "0"
      }
   ],
   "summary" : {
      "allrows" : "34",
      "page" : "1",
      "rows" : "1",
      "totalpages" : "1",
      "totalrows" : "1"
   }
}

Thanks

dpeters commented 11 years ago

Ahh...i see what's happening here. If you list the servicechecks in alphabetical order in the manifest, then it should work.

The providers retrieve arrays from Opsview in alpha sort, and if the manifest doesn't have the same order, then puppet interpretes them as being different. I'll file a bug for this, since I'm pretty sure it can be fixed.

yasn77 commented 11 years ago

Yep, sending the checks in a sorted order fixed the problem.

As a reference for anyone else that stumble on this issue, this is how i set my checks now:


class opsview {

       package { "opsview-agent":
               ensure => latest,
       }

        $res = inline_template("<%= opsview_servicechecks.sort.join(',') %>")
        $opsview_servicechecks = split($res, '[,]')

        opsview_monitored { $hostname:
                reload_opsview => True,
                ensure => present,
                ip => $ip,
                hostgroup => $opsview_hostgroup,
                servicechecks => $opsview_servicechecks,
                provider => opsview,
        }
}

I would be interested in knowing the Bug report ID so that i can follow it.

Thanks for all your help :)