Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
1.99k stars 573 forks source link

get_objects function does not return hosts added via the API when used inside an attribute's function. #8209

Open koppel opened 4 years ago

koppel commented 4 years ago

References

Summary

I have found that while get_objects (and get_object) work perfectly for many use cases, they fail to return any hosts that were added via the Icinga2 API when used inside an attribute's custom function. See the Prime example in the Not Working section below.

Details:

Our Environment

icinga2 feature list
Disabled features: compatlog debuglog elasticsearch gelf graphite influxdb livestatus opentsdb perfdata statusdata
Enabled features: api checker command ido-mysql mainlog notification syslog

Expected Behavior

get_object and get_objects should always return all hosts, either defined in the DSL or added via the API, even inside an attribute's custom function.

Current Behavior

Working

get_object for hosts added dynamically via the API, from the Icinga console.

<1> => server = get_object(Host, "test_host_01")
null
<2> => log("Get specific host:  " + server.name + ":  " + server.groups.join(",") + "\n")

and in the log:

[2020-09-01 18:50:07 +0000] information/config: Get specific host:  test_host_01:  all_linux_hosts,aws_test01,AWS

get_objects for hosts added dynamically via the API, from the Icinga console.

<1> => host_group = "aws_test01"
<2> => filter_function = function(node) use(host_group) { host_group in node.groups }
<3> => nodes = get_objects(Host).filter(filter_function)
<4> => for (node in nodes) { log(node.name) }

and in the log:

[2020-09-01 20:30:26 +0000] information/config: test_host_01
[2020-09-01 20:30:26 +0000] information/config: test_host_02
[2020-09-01 20:30:26 +0000] information/config: test_host_03

get_objects for hosts added dynamically via the API, from a normal function.

The function:

$ cat functions_test.conf
globals.get_hostgroup_members = function (host_group) {
    cluster_nodes = []
    all_nodes = get_objects(Host)
    for (node in all_nodes) {
        if (host_group in node.groups) {
            log(
                LogInformation,
                "get_hostgroup_members",
                host_group + ": found node:  " + node.name
            )
            cluster_nodes.add(node)
        }
    }
    return cluster_nodes
}

Call from the Icinga Console:

<1> => nodes = get_hostgroup_members("aws_test01_cluster")
null
<2> => for (node in nodes) { log(node.name) }

and in the log:

[2020-09-01 20:30:26 +0000] information/config: test_host_01
[2020-09-01 20:30:26 +0000] information/config: test_host_02
[2020-09-01 20:30:26 +0000] information/config: test_host_03

Not Working

Using the documentation provided by @dnsmichi, from within an attribute's custom function.

See: Get host objects in hostgroup - failing for hosts added via the API

For hosts added dynamically via the API, get_objects does not return any hosts.

Function (defined in lambda for brevity):

$ cat cluster_check.conf
object Host "Cluster:  aws_test01_cluster" {
    enable_notifications = false
    import "aws_test01_cluster"
    check_command = "dummy"
    vars += {
        dummy_state = get_dummy_test("state", "aws_test01_cluster", "App: Healthcheck URL")
        dummy_text  = {{
            host_group = "aws_test01_cluster"
            cluster_nodes = get_hostgroup_members(host_group)
            output = "Nodes in " + host_group + ":\n"
            for (node in cluster_nodes) {
                output += node.name + "\n"
            }
            return output
        }}
    }
}
Plugin Output
Nodes in aws_test01_cluster:

Problem handling
Not acknowledged    Acknowledge
Comments    Add comment
Downtimes   Schedule downtime
...

Prime example: Attempt to add logging, and ensure get_objects returns at least some of the hosts.

The function, moving the logic from the external function into the lambda:

$ cat cluster_test.conf
object Host "Cluster:  aws_test01_cluster - lambda test" {
    enable_notifications = false
    import "aws_test01_cluster"
    check_command = "dummy"
    vars += {
        dummy_state = get_dummy_test("state", "aws_test01_cluster", "App: Healthcheck URL")
        dummy_text  = {{
            cluster_nodes = []
            all_nodes = get_objects(Host)
            host_group = "aws_test01_cluster"

            output = "Nodes in " + host_group + ":\n"
            for (node in all_nodes) {
                if (host_group in node.groups) {
                    log(
                        LogInformation,
                        "get_hostgroup_members",
                        host_group + ": found node:  " + node.name
                    )
                    cluster_nodes.add(node)
                }
            }
            for (node in cluster_nodes) {
                output += node.name + "\n"
            }

            output += "\n\nAll nodes:\n"
            for (node in all_nodes) {
                output += node.name + ":  " + node.groups.join(",") + "\n"
            }

            return output

        }}
    }
}

In Icingaweb2 (truncated for brevity):

Plugin Output
Nodes in aws_test01_cluster:

All nodes:
kafka-01:  kafka, all_linux_hosts, production_hosts
cassandra-01:  cassandra, all_linux_hosts, production_hosts
zookeeper-01:  zookeeper, all_linux_hosts, production_hosts

...truncated

Here, all of our statically-defined hosts are listed, so get_objects is working for them, but not for hosts added via the API.

Extra test: get_object cannot return the same host as the first test above, in the Working section.

So, this worked above from the console, but if you put that same logic in the dummy_text lambda function, it fails:

... snippet - added to the function defined in the previous example.
server = get_object(Host, "test_host_01")
output += "Get specific host:  " + server.name + ":  " + server.groups.join(",") + "\n"
...|

and in Icingaweb2:

Plugin Output
Exception occurred while checking 'Cluster: App: Healthcheck URL: aws_test01_cluster - lambda test': Error: Argument is not a callable object.
Location: in /var/lib/icinga2/api/zones/agile-zone/_etc/cluster_checks_aws_test01.conf: 59:70-59:92
/var/lib/icinga2/api/zones/agile-zone/_etc/cluster_checks_aws_test01.conf(57): 
/var/lib/icinga2/api/zones/agile-zone/_etc/cluster_checks_aws_test01.conf(58):             server = get_object(Host, "test_host_01")
/var/lib/icinga2/api/zones/agile-zone/_etc/cluster_checks_aws_test01.conf(59):             output += "Get specific host:  " + server.name + ":  " + server.groups.join(", ") + "
"
                                                                                                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones/agile-zone/_etc/cluster_checks_aws_test01.conf(60): 
/var/lib/icinga2/api/zones/agile-zone/_etc/cluster_checks_aws_test01.conf(61):             output += "

All nodes:
"

    (0) Resolving macros for string '$dummy_text$'
    (1) Executing check for object 'Cluster: App: Healthcheck URL: aws_test01_cluster - lambda test'

Steps to Reproduce

  1. Add a host via the API.
  2. Confirm that you can see the host by using the get_object function via the Icinga Console.
  3. Create a custom function as an attribute as documented here: Use Custom Functions as Attribute
  4. Filter for the new host by hostname, using one of the methods above, also documented here: Get host objects in hostgroup with get_objects()

Update - get_service also does not return data for hosts added via the API when used inside an attribute's function.

In an attempt to work around the above bug, I tried hard-coding the hostnames of the cluster members in the cluster check, but get_service does not return data when used inside an attribute's custom function for hosts added via the API.

Fails - used inside an attribute's custom function for hosts added via the API.

object Host "Cluster: aws_test01_cluster - lambda test" {
    enable_notifications = false
    import "aws_test01_cluster"
    check_command = "dummy"
    vars += {
        dummy_state = get_dummy_test("state", "aws_test01_cluster", "App: Healthcheck URL")
        dummy_text  = {{
            cluster_nodes = [
                "test_host_01",
                "test_host_02",
                "test_host_03"
            ]
            var host_group   = "aws_test01_cluster"
            var service_name = "App: Healthcheck URL"

            output = "State and output:\n"
            for (node in cluster_nodes) {
                var service      = get_service(node, service_name)
                var health_state = service.last_check_result.state
                var health_output = service.last_check_result.output
                output += node + ":  " + health_state + ":  " + health_output + "\n"
            }

            return output

        }}
    }
}

and seen in Icingaweb2's output:

Plugin Output

State and output:
test_host_01:  :  
test_host_02:  :  
test_host_03:  :

Working - The same code run from the Icinga2 Console

<1> => cluster_nodes = [
<2> =>     "test_host_01",
<3> =>     "test_host_02",
<4> =>     "test_host_03"
<5> => ]
null
<6> => var host_group   = "aws_test01_cluster"
null
<7> => var service_name = "App: Healthcheck URL"
null
<8> => for (node in cluster_nodes) {
<9> =>     var service       = get_service(node, service_name)
<10> =>    var health_state  = service.last_check_result.state
<11> =>    var health_output = service.last_check_result.output
<12> =>    log(node + ":  " + health_state + ":  " + health_output + "\n")
<13> => }
null

and in the log:

[2020-09-03 17:59:14 +0000] information/config: test_host_01:  0:  HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.012 second response time
[2020-09-03 17:59:14 +0000] information/config: test_host_02:  0:  HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.011 second response time
[2020-09-03 17:59:14 +0000] information/config: test_host_03:  0:  HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.025 second response time

Working: The identical code from above, but replacing the servers with with some that are statically defined in the DSL

...
cluster_nodes = [
    "static_test_host_01",
    "static_test_host_02",
    "static_test_host_03"
...
]

and in Icingaweb2:

Plugin Output

State and output:
static_test_host_01:  0:  HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.027 second response time 
static_test_host_02:  0:  HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.078 second response time 
static_test_host_03:  0:  HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.026 second response time

Conclusion

Functions that work properly when used from the Icinga2 Console, or inside a regular function for all hosts do not work for hosts that were added dynamically via the API, when used inside an attribute's custom function.

✅ All functions work for statically-defined hosts in all circumstances that were tested. ✅ All functions work for dynamically-added hosts (via the API) when used from the Icinga2 Console. ✅ All functions work for dynamically-added hosts (via the API) when used in a standard function. ❌ None of the tested functions work for dynamically-added hosts (via the API) when used in an attribute's custom function. The impact of this is that cluster checks do not work for hosts that were added dynamically via the API.

nilkonto commented 4 years ago

👍

Al2Klimov commented 4 years ago

Hello @koppel and thank you for reporting!

cluster_nodes = []
all_nodes = get_objects(Host)

Both of these are global variables. Please try again with "var ".

Best, AK

koppel commented 4 years ago

I have variously tested with and without using var, and the result was the same. But I did just try changing the variable's names and adding var back in. Again, the result was the same:

object Host "Cluster:  aws_test01_cluster - lambda test" {
    enable_notifications = false
    import "aws_test01_cluster"
    check_command = "dummy"
    vars += {
        dummy_state = get_dummy_test("state", "aws_test01_cluster", "App: Healthcheck URL")
        dummy_text  = {{
            var selected_nodes = []
            var every_node = get_objects(Host)
            var host_group = "aws_test01_cluster"

            var output = "Nodes in " + host_group + ":\n"
            for (node in every_node) {
                if (host_group in node.groups) {
                    log(
                        LogInformation,
                        "get_hostgroup_members",
                        host_group + ": found node:  " + node.name
                    )
                    selected_nodes.add(node)
                }
            }
            for (node in selected_nodes) {
                output += node.name + "\n"
            }

            output += "\n\nAll nodes:\n"
            for (node in every_node) {
                output += node.name + ":  " + node.groups.join(",") + "\n"
            }

            return output

        }}
    }
}

and in Icingaweb2:

Plugin Output
Nodes in aws_test01_cluster:

All nodes:
kafka-01:  kafka, all_linux_hosts, production_hosts
cassandra-01:  cassandra, all_linux_hosts, production_hosts
zookeeper-01:  zookeeper, all_linux_hosts, production_hosts

...truncated
Al2Klimov commented 1 year ago

Which zones are your hosts in?