F5Networks / f5-telemetry-streaming

F5 BIG-IP Telemetry Streaming
Apache License 2.0
53 stars 24 forks source link

Add virtualServers and clientSslProfiles labels to certain Telemetry Streaming metrics #257

Open barakbd opened 1 year ago

barakbd commented 1 year ago

Is your feature request related to a problem? Please describe.

  1. We want to show VirtualServers - clientSslProfiles relationship
  2. We want to show HTTP requests rate (2xx,3xx,4xx) by Client SSL Profile

Describe the solution you'd like

  1. Metrics:

Add label: virtualServers (currently has clientSslProfiles )

  1. Metrics:
    • f5_numberReqs
    • f5_2xxResp
    • f5_3xxResp
    • f5_4xxResp
    • f5_5xxResp

Add labels: clientSslProfiles and virtualServers

Describe alternatives you've considered

Email from Matt Stovall: By using the telemetry streaming custom endpoint /mgmt/tm/ltm/virtual/profiles/stats you can get the equivalent metrics per virtual server. They are not using a Prometheus label, but the name of the virtual server is added to the metric name.

Example output from TS Pull consumer endpoint:

f5_vsProfileStats__Common_{{ virtual server name}}_Common_{{ virtual server name}}_profiles_stats__Common_{{ virtual server name}}_profiles_Common_{{clientSSL Profile name}}_stats_common_activeHandshakeRejected
f5_vsProfileStats__Common_{{ virtual server name}}_stats__Common_{{ virtual server name}}_profiles_stats__Common_{{ virtual server name}}_profiles_Common_{{clientSSL Profile name}}_stats_common_curNativeConns
f5_vsProfileStats__Common_{{ virtual server name}}__Common__{{ virtual server name}}_profiles_stats__Common_{{ virtual server name}}_profiles_Common_{{clientSSL Profile name}}__stats_common_currentActiveHandshakes

Example for virtual server name https_multi_cert and clientSSL profile default_sni:

f5_vsProfileStats__Common_https_multi_cert_stats__Common_https_multi_cert_profiles_stats__Common_https_multi_cert_profiles_Common_default_sni_stats_common_activeHandshakeRejected 0

To get these to show up in TS output, you just need to define another custom endpoint in your telemetry streaming declaration. You already have a few custom endpoints defined:

        "Custom_Endpoints": {
            "class": "Telemetry_Endpoints",
            "items": {
                "vsProfileStats": {
                    "name": "vsProfileStats",
                    "path": "/mgmt/tm/ltm/virtual/profiles/stats",
                },
    }
}

If you wanted to show the VS names as a label instead of in the name- that would take a new telemetry streaming github request. We can submit GitHub requests here: https://github.com/F5Networks/f5-telemetry-streaming/issues

Nachtfalkeaw commented 5 months ago

Hello,

I think this is not a feature request it is a bug. If you use the default declaration the formatting of the metrics ist correct. If you collect the same metrics using CustomEndpoint the formatting is garbage.

Here the garbage metrics format of a custom Endpoint: f5_customEndpoint_counters_metric

This is the same value (bitsIn/out) from the default declaration. If you do not configure anything, just enable the OpenTelemetry API for prometheus PULL it looks like this.: f5_default_counters_metric

And both use this source: f5_Path_counters_metric

v1.33.0 and v1.34.0 of the OpenTelemetryPlugin

megamattzilla commented 5 months ago

Hi @barakbd and @Nachtfalkeaw,

I'm the F5 solutions engineer working with Barak on their telemetry streaming initiatives.

It appears there are two different requests in this github issue.

1.) First Request

When you define a custom endpoint that provides statistics per virtual server such as /mgmt/tm/ltm/virtual/profiles/stats, add the virtual server name to those metrics as a label. This seems reasonable to me- all the data for this is located in the control plane already.

For example, that custom endpoint can help provide insight into bits per virtual server instead of global bits in/out (which can be very useful to identify which virtual servers have higher throughout) produces a metric like this:

# HELP f5_vsProfileStats__Common_asm_demo_http_stats_clientside_bitsOut vsProfileStats_/Common/asm-demo-http/stats_clientside.bitsOut
# TYPE f5_vsProfileStats__Common_asm_demo_http_stats_clientside_bitsOut gauge
f5_vsProfileStats__Common_asm_demo_http_stats_clientside_bitsOut 1028544

The metric output is formatted in a way that is difficult to parse. The virtual server name /Common/asm_demo_http is there- but its difficult to extract and then graph that this metric is the bitsOut for virtual server /Common/asm_demo_http. Ideally the name could be improved and a prometheus friendly label could be added so that the metric looks like this instead:

# HELP f5_vsProfileStats__clientside_bitsOut vsProfileStats_/Common/asm-demo-http/stats_clientside.bitsOut
# TYPE f5_vsProfileStats__clientside_bitsOut gauge
f5_vsProfileStats__clientside_bitsOut{virtualServers="/Common/asm-demo-http"} 1028544

That way the metric could be natively graphed in prometheus/grafana as associated with virtual server /Common/asm_demo_http. You could then graph bits per second by virtual server instead of only having global bits Out and not knowing which virtual servers are contributing to that.

2.) Second Request

Add labels for clientSslProfiles and virtualServers names to various metrics produced by clientSSL and HTTP profiles. To my knowledge, there is no data in TMOS that maps these things together that telemetry streaming could query.

I suggest we focus on the first request as that seems within the scope of TS and immediately useful.

Thanks!

Nachtfalkeaw commented 5 months ago

The metrics names should be the same if I query the same metrics than in default configuration. The reason for that is pretty simple. If I query all metrics every 5 seconds the CPUs are overloaded. However for very limited amout of values apolling interval of 5s is usefull e.g. CPU and memory.

Other values like overall throughput there it is sufficient to poll every 15s and other things every 60s.

software versions, hw version ist relevant only e.g. every 6hrs.

So the idea of different Pull_Consumers is very good. However to use them the "Custom_Endpoints" must generate the same metric output than the default poll so that the metrics from different intervalls can be matched correctly - and not only matched correctly - they should be the same metric. if every Poller generates different metrics the result is duplicate metrics in Prometheus. The metrics from default poller for CPU and the metrics for CPU from Custom Endpoint.

However - if it is not possible to generate the same metrics name than the different metrics should share the same label sets so that it is possible to merge different metrics based on the same labels - and hopefully the labels unique identify that they are the same.

B0go commented 3 months ago

I can confirm this problem is also affecting me! Once I enable custom endpoints to filter out the results of the scrape (so I can avoid the CPU overload), the metrics get reported in a different pattern:

# HELP f5_detailedCPU_sys_host_info_0_sys_hostInfo_0_cpuInfo_sys_hostInfo_0_cpuInfo_1_oneMinAvgUser detailedCPU_sys/host-info/0_sys/hostInfo/0/cpuInfo_sys/hostInfo/0/cpuInfo/1_oneMinAvgUser
# TYPE f5_detailedCPU_sys_host_info_0_sys_hostInfo_0_cpuInfo_sys_hostInfo_0_cpuInfo_1_oneMinAvgUser gauge
f5_detailedCPU_sys_host_info_0_sys_hostInfo_0_cpuInfo_sys_hostInfo_0_cpuInfo_1_oneMinAvgUser 14

This also makes the process of finding which endpoints have the metrics I need pretty hard

B0go commented 3 months ago

@megamattzilla This has become a blocker for using TS to observe the BigIPs using Prometheus as the metrics engine, mainly because, on the one hand, we can't enable the collection of all metrics without seeing a significant impact on CPU usage. On the other hand, we can't use the custom endpoint approach as the current output doesn't allow for proper label matching, filtering, etc.

If we can't find a solution, we will be forced to use the snmp_exporter. I would gladly avoid that if possible, as it requires more configuration complexity.

Do you have any status updates that can be shared?

pgouband commented 3 months ago

Hi @B0go,

Please contact your F5 account team so they can contact us (the product management team).

barakbd commented 3 months ago

It would really helpful to simply allow TF metrics to have customized labels