F5Networks / f5-telemetry-streaming

F5 BIG-IP Telemetry Streaming
Apache License 2.0
53 stars 24 forks source link

Add Prometheus pull consumer support for custom endpoint /mgmt/tm/sys/ha-status #243

Closed megamattzilla closed 1 year ago

megamattzilla commented 1 year ago

Add Prometheus pull consumer support for custom endpoint /mgmt/tm/sys/ha-status

In TMOS there is a useful API endpoint /mgmt/tm/sys/ha-status that reports when any critical services are experiencing an issue. The API endpoint reports No when services are healthy and yes when an issue is experienced.

Currently, the Prometheus pull consumer is not able to include these metrics when added as a custom endpoint.

Describe the solution you'd like

When API endpoint /mgmt/tm/sys/ha-status is added as a custom endpoints definition, have TS scrape the endpoint /mgmt/tm/sys/ha-status and report either a failure 0 or 1 for each critical service that is being monitored.

Additional context

Example output of this ha-status via tmsh and rest API:

[root@15-1-demo:/S2-green-P::Active:Standalone] config # tmsh show sys ha-status

----------------------------------------------------------------------------
Sys::HA Status
Slot  Feature               Key           Action                        Fail
----------------------------------------------------------------------------
2     asm-config-fail       asmcsd        restart-all                   no
2     cluster-mbr-disabled  clusterd      go-offline-downlinks          no
2     cluster-time-sync     clusterd      reboot                        no
2     compression-failsafe  tmm0          failover                      no
2     compression-failsafe  tmm1          failover                      no
2     crypto-failsafe       cn-crypto-0   go-offline-downlinks          no
2     crypto-failsafe       cn-crypto-1   go-offline-downlinks          no
2     daemon-heartbeat      %snmpd        restart                       no
2     daemon-heartbeat      autodosd      restart                       no
2     daemon-heartbeat      bd            restart                       no
2     daemon-heartbeat      bdosd         restart                       no
2     daemon-heartbeat      bigd          restart                       no
2     daemon-heartbeat      cbrd          restart                       no
2     daemon-heartbeat      clusterd      go-offline-downlinks-restart  no
2     daemon-heartbeat      datasyncd     restart                       no
2     daemon-heartbeat      dosl7d        restart                       no
2     daemon-heartbeat      flowspecd     restart                       no
2     daemon-heartbeat      guestagentd   restart                       no
2     daemon-heartbeat      keymgmtd      restart                       no
2     daemon-heartbeat      mcpd          restart                       no
2     daemon-heartbeat      mysqlhad      restart-all                   no
2     daemon-heartbeat      scriptd       restart                       no
2     daemon-heartbeat      sod           restart-all                   no
2     daemon-heartbeat      tmm           go-offline-downlinks-restart  no
2     daemon-heartbeat      tmm1          go-offline-downlinks-restart  no
2     daemon-heartbeat      tmrouted      restart                       no
2     daemon-heartbeat      vxland        restart                       no
2     daemon-heartbeat      wccpd         restart                       no
2     dataplane-inoperable  tmm           reboot                        no
2     dataplane-inoperable  tmm1          reboot                        no
2     forced-offline        sod           none                          no
2     hypervisor-offline    chmand        go-offline                    no
2     license-exceeded      mcpd          go-offline-downlinks          no
2     license-invalid       mcpd          go-offline-downlinks          no
2     min-up-cluster-mbr    clusterd      failover                      no
2     mpi-failsafe          tmm           go-offline-downlinks          no
2     mysqld-failure        mysqlhad      restart-all                   no
2     nic-failsafe          tmm           reboot                        no
2     nic-failsafe          tmm1          reboot                        no
2     overdog-ctrl          watchdog      none                          no
2     proc-run              bd            go-offline-downlinks          no
2     proc-run              bigd          go-offline-downlinks          no
2     proc-run              clusterd      go-offline-downlinks          no
2     proc-run              datasyncd     go-offline-downlinks          no
2     proc-run              mcpd          go-offline-downlinks          no
2     proc-run              tmm           go-offline-downlinks          no
2     proc-run              tmrouted      failover                      no
2     provisioning-failed   provisioning  go-offline-downlinks          no
2     ready-for-world       tmm           none                          no
2     ready-for-world       tmm1          none                          no
2     reboot-request        sod           reboot                        no
2     software-update       lind          reboot                        no
2     tmm-detect-fail       tmm           failover                      no
2     wait-primary-sod      sod           none                          no
{
    "kind": "tm:sys:ha-status:ha-statusstats",
    "selfLink": "https://localhost/mgmt/tm/sys/ha-status?ver=15.1.5.1",
    "entries": {
        "https://localhost/mgmt/tm/sys/ha-status/2:asm-config-fail:asmcsd": {
            "nestedStats": {
                "entries": {
                    "action": {
                        "description": "restart-all"
                    },
                    "failure": {
                        "description": "no"
                    },
                    "haFeature": {
                        "description": "asm-config-fail"
                    },
                    "key": {
                        "description": "asmcsd"
                    },
                    "slot": {
                        "value": 2
                    }
                }
            }
        }
}
G-gonzalezjimenez commented 1 year ago

Hi, F5 BIG-IP Telemetry Streaming is entering a phase of ongoing maintenance and support. A product in maintenance mode continues to receive support and ensures its stability with regular critical fixes and security updates. This maintenance approach helps maintain the longevity and reliability of the product for the long term. Enhancement requests for this product will be evaluated on an individual basis, taking into consideration their overall impact and alignment with our business objectives. Only those with a strong case for improvement will be considered for implementation. There is no plan to deprecate this product. If you have a business case for this, please let me know and I will let you know how to contact us. Thank you