aerogear / keycloak-metrics-spi

Adds a Metrics Endpoint to Keycloak
Apache License 2.0
530 stars 152 forks source link

MGDSTRM-1664 Resource support for latency metrics #97

Closed slaskawi closed 3 years ago

slaskawi commented 3 years ago

Motivation

https://issues.redhat.com/browse/MGDSTRM-1664

In order to support better observability, all the performance metrics need to support resources. This way, we know exactly which endpoints are fine and which cause some problems.

What

This Pull Request introduces a new "resource" field into the latency metrics. Here's an example:

keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="50.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="100.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="250.0",} 2.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="500.0",} 2.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="1000.0",} 2.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="2000.0",} 2.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="10000.0",} 2.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="30000.0",} 2.0
keycloak_request_duration_bucket{method="POST",resource="/protocol/openid-connect/token",le="+Inf",} 2.0
keycloak_request_duration_count{method="POST",resource="/protocol/openid-connect/token",} 2.0
keycloak_request_duration_sum{method="POST",resource="/protocol/openid-connect/token",} 122.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="50.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="100.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="250.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="500.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="1000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="2000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="10000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="30000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin",le="+Inf",} 14.0
keycloak_request_duration_count{method="GET",resource="/admin",} 14.0
keycloak_request_duration_sum{method="GET",resource="/admin",} 39.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="50.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="100.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="250.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="500.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="1000.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="2000.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="10000.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="30000.0",} 1.0
keycloak_request_duration_bucket{method="POST",resource="/login-actions/authenticate",le="+Inf",} 1.0
keycloak_request_duration_count{method="POST",resource="/login-actions/authenticate",} 1.0
keycloak_request_duration_sum{method="POST",resource="/login-actions/authenticate",} 30.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="50.0",} 13.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="100.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="250.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="500.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="1000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="2000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="10000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="30000.0",} 14.0
keycloak_request_duration_bucket{method="GET",resource="/admin/clients",le="+Inf",} 14.0
keycloak_request_duration_count{method="GET",resource="/admin/clients",} 14.0
keycloak_request_duration_sum{method="GET",resource="/admin/clients",} 136.0
keycloak_request_duration_bucket{method="GET",resource="",le="50.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="100.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="250.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="500.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="1000.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="2000.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="10000.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="30000.0",} 15.0
keycloak_request_duration_bucket{method="GET",resource="",le="+Inf",} 15.0
keycloak_request_duration_count{method="GET",resource="",} 15.0
keycloak_request_duration_sum{method="GET",resource="",} 89.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="50.0",} 0.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="100.0",} 0.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="250.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="500.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="1000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="2000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="10000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="30000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/protocol/openid-connect/auth",le="+Inf",} 2.0
keycloak_request_duration_count{method="GET",resource="/protocol/openid-connect/auth",} 2.0
keycloak_request_duration_sum{method="GET",resource="/protocol/openid-connect/auth",} 407.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="50.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="100.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="250.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="500.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="1000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="2000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="10000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="30000.0",} 2.0
keycloak_request_duration_bucket{method="GET",resource="/admin/authentication",le="+Inf",} 2.0
keycloak_request_duration_count{method="GET",resource="/admin/authentication",} 2.0
keycloak_request_duration_sum{method="GET",resource="/admin/authentication",} 5.0

Why

This feature is require for better understanding the performance problems.

How

An additional "resource" field is added to the metrics.

Verification Steps

  1. Build the code
  2. Copy the artifact into $KC/standalone/deployments
  3. Make sure the metrics logger is turned on
  4. Play with Keycloak a bit
  5. Look at the following url: "$KC/auth/realms/master/metrics"

Checklist:

Progress

maleck13 commented 3 years ago

Not familiar with the code base but the metic examples look good to me.

slaskawi commented 3 years ago

@maleck13 @pb82 I changed the algorithm to scrape the last two matched URIs. From my testing it looks good but I also added an "emergency turn off button".

pb82 commented 3 years ago

@slaskawi thanks, i'll take another look

slaskawi commented 3 years ago

@pb82 @maleck13 Can I can to review and merge this one? Once it's in, Peter, could I ask you for a release?

pb82 commented 3 years ago

:eyes:

slaskawi commented 3 years ago

@pb82 @maleck13 @abstractj Last chance to get this merged and onboarded into the production. After Wednesday, I'd say it's too late.