LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.28k stars 884 forks source link

[Bug]: Bad Prometheus metric performance due to `endpoint` label explosion for images and feeds #4431

Open SorteKanin opened 9 months ago

SorteKanin commented 9 months ago

Requirements

Summary

Prometheus metrics currently have performance issues due to label explosion of the endpoint label. This is handled for some endpoints (like endpoint="/comment/{comment_id} instead of using the actual ID) but not for others (like endpoint="/feeds/u/user@instance.xml" and endpoint="/pictrs/image/image_uuid.webp").

Steps to Reproduce

  1. Run pretty much any query involving lemmy_api_http_requests_duration_seconds_bucket. For instance histogram_quantile(0.99, sum by(le) (rate(lemmy_api_http_requests_duration_seconds_bucket[$__rate_interval])))
  2. Observe the very slow performance. The query times out for me if I set it for the last 24 hours.

Technical Details

It's not a bug in the sense that logs will help. It's a performance bug, not a correctness issue. The system is working as designed, just with poor performance.

Suggested solutions

Do the same as the endpoint="/comment/{comment_id} strategy for image and feed endpoints as well.

Version

BE 0.19.3

Lemmy Instance URL

Feddit.dk

Nutomic commented 9 months ago

I dont see any difference between these endpoint definitions in Lemmy. It seems to be an issue with the rust-prometheus library.

Edit: The relevant code is here, and it uses HttpRequest.match_pattern() from actix_web. That method is wrongly returning None for the feeds endpoint, for reasons I dont understand.