Ptrskay3 / axum-prometheus

🔎 Prometheus metrics middleware for Axum
MIT License
62 stars 9 forks source link

Crashes if port 9000 blocked #66

Open bryanlarsen opened 3 months ago

bryanlarsen commented 3 months ago

How to reproduce:

Result:

thread 'main' panicked at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/axum-prometheus-0.7.0/src/lib.rs:850:14:
Failed to build metrics recorder: FailedToCreateHTTPListener("Address already in use (os error 98)")
stack backtrace:
   0: rust_begin_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1654:5
   3: core::result::Result<T,E>::expect
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1034:23
   4: <axum_prometheus::Handle as core::default::Default>::default
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/axum-prometheus-0.7.0/src/lib.rs:837:29
   5: axum_prometheus::GenericMetricLayer<T,M>::pair
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/axum-prometheus-0.7.0/src/lib.rs:766:46
   6: prom_test::main::{{closure}}
             at ./main.rs:7:45
   7: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/park.rs:281:63
   8: tokio::runtime::coop::with_budget
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/coop.rs:107:5
   9: tokio::runtime::coop::budget
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/coop.rs:73:5
  10: tokio::runtime::park::CachedParkThread::block_on
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/park.rs:281:31
  11: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/context/blocking.rs:66:9
  12: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/multi_thread/mod.rs:87:13
  13: tokio::runtime::context::runtime::enter_runtime
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/context/runtime.rs:65:16
  14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/multi_thread/mod.rs:86:9
  15: tokio::runtime::runtime::Runtime::block_on_inner
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/runtime.rs:363:45
  16: tokio::runtime::runtime::Runtime::block_on
             at /home/blarsen/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/runtime.rs:335:13
  17: prom_test::main
             at ./main.rs:22:5
  18: core::ops::function::FnOnce::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5

main.rs for reference, copied from https://docs.rs/axum-prometheus/latest/axum_prometheus/index.html:

use std::{net::SocketAddr, time::Duration};
use axum::{routing::get, Router};
use axum_prometheus::PrometheusMetricLayer;

#[tokio::main]
async fn main() {
    let (prometheus_layer, metric_handle) = PrometheusMetricLayer::pair();
    let app = Router::new()
        .route("/fast", get(|| async {}))
        .route(
            "/slow",
            get(|| async {
                tokio::time::sleep(Duration::from_secs(1)).await;
            }),
        )
        .route("/metrics", get(|| async move { metric_handle.render() }))
        .layer(prometheus_layer);

    let listener = tokio::net::TcpListener::bind(SocketAddr::from(([127, 0, 0, 1], 3000)))
        .await
        .unwrap();
    axum::serve(listener, app).await.unwrap()
}

Cargo.toml:

[package]
name = "prom-test"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = "0.7.5"
axum-prometheus = "0.7.0"
tokio = "1.39.2"
bryanlarsen commented 3 months ago

axum-prometheus 0.6.1 does not have this problem.

Ptrskay3 commented 3 months ago

Don't think this is an issue with axum-prometheus. This error originates from metrics-exporter-prometheus.

0.6.1 does not have this issue, because that code I linked is a very recent change. I'm pretty sure you should file an issue for them.

bryanlarsen commented 3 months ago

The documentation for metrics-exporter-prometheus' PrometheusBuilder indicates that it opens port 9000 by default, so it's behaving as documented.

Ptrskay3 commented 3 months ago

Well, axum-prometheus is clear about using metrics-rs and its ecosystem, so what could we possibly do in the default implementation? There're plenty of ways to work around this via lower level APIs that give you finer control.. basically just don't use pair/pair_from and with_default_metrics functions from this crate. BaseMetricLayer is completely decoupled from the metric exporter initialization code, and you may perform it as you like.

I'd rather not duplicate the documentation of metrics-exporter-prometheus in this crate either.

tobz commented 3 months ago

Ah yeah, it looks like this is a consequence of using PrometheusBuilder::build to ensure the upkeep thread is started, but which also builds an exporter future... which will end up triggered the binding of the port. 😓

I'll have to think a little bit about what a better builder API could look like... because it seems like like this keeps coming up.

oxalica commented 3 weeks ago

I'd rather not duplicate the documentation of metrics-exporter-prometheus in this crate either.

It's not documented that http-listener feature (which overlaps the scope of axum to me). The documentation gives an example that manually wire the rendering code into a axum route. That makes me assume that there would be no other listener otherwise.

Could we disable this feature by default, or at least make it possible to be disabled? Downstream users can still enable it if they want (features will merge). But currently, downstream users cannot disable it because it's explicitly enabled via axum-prometheus.

Ptrskay3 commented 3 weeks ago

The reason why this crate changed the defaults is because the upkeep task needs to be started in metrics-exporter-prometheus in order to prevent unbound histogram growth, see #52 . This led to an unfortunate consequence that metrics-exporter-prometheus binds the default (or configured) HTTP port.. but that http listener is not used in any way in axum-prometheus .

It's possible to avoid that if you reach for PrometheusMetricLayerBuilder::with_metrics_from_fn and construct the PrometheusBuilder yourself (you either don't care about the upkeep task, or bind a different random port, etc).

I do not intend to disable the http-listener feature, because the crate has essentially no point if you use it with metrics-exporter-prometheus, but you have neither http-listener nor push-gateway enabled. You have an option to disable the prometheus feature, which is exactly what people should do if they don't want to work with Prometheus.

As a workaround, I could probably start the http listener explicitly on 127.0.0.1:0, so that the operating system will choose an available random port, but I don't think that's a viable solution.. I'd rather like to have the root cause fixed.

EDIT: I'll reopen this issue for visibility.

oxalica commented 3 weeks ago

Thanks for the explanation and with_metrics_from_fn does work for me.

52

In my own setup, I use .build_recorder() and manually spawn the upkeeping thread (because I want tokio tasks rather than system threads). Also, leaking threads (there is no way to stop it even though all Handles are dropped) in Default::default sounds so dirty. I'd say I don't like the API design for this. If we need to spawn threads, we should use build_recorder to avoid different behaviors depending on upstream features, and spawn our own upkeeping thread when building PrometheusMetricLayer (if we want to).

I do not intend to disable the http-listener feature, because the crate has essentially no point if you use it with metrics-exporter-prometheus, but you have neither http-listener nor push-gateway enabled.

Why saying "essentially no point"? I do use metrics-exporter-prometheus with default features disabled. As mentioned above, I use .build_recorder(), spawning my own upkeeping thread and run axum for serving the scraping server. It runs perfectly fine and never tries to bind :9000. Thus I don't think http-listener is technically required when having axum.

Ptrskay3 commented 2 weeks ago

Why saying "essentially no point"? I do use metrics-exporter-prometheus with default features disabled. As mentioned above, I use .build_recorder(), spawning my own upkeeping thread and run axum for serving the scraping server. It runs perfectly fine and never tries to bind :9000. Thus I don't think http-listener is technically required when having axum.

Oh, maybe I'm just completely off with my mental model. Currently I don't actively use this library, and forgot some stuff. I think it's worth a try then to just disable the default features completely for metrics-exporter-prometheus. I'm not sure when I'll have the time to really work on this, and figure out whether it is a breaking change.

Would you mind sharing your setup code you're using with axum?