cloudflare / pingora

A library for building fast, reliable and evolvable network services.
Apache License 2.0
22.53k stars 1.25k forks source link

Questions about performance testing #372

Open SsuiyueL opened 2 months ago

SsuiyueL commented 2 months ago

Hello, I encountered some issues while conducting performance testing. I reviewed previous issues, but they did not resolve my problem. Could you please help me with a detailed explanation? I would greatly appreciate it.

I have implemented a simple HTTP proxy using Nginx (OpenResty and Nginx-Rust) and Pingora. Below is the code I have implemented based on the example [modify_response]:

pub struct Json2Yaml {
    addr: std::net::SocketAddr,
}
impl ProxyHttp for Json2Yaml {
    type CTX = MyCtx;
    fn new_ctx(&self) -> Self::CTX {
        MyCtx { buffer: vec![] }
    }
    async fn upstream_peer(
        &self,
        _session: &mut Session,
        _ctx: &mut Self::CTX,
    ) -> Result<Box<HttpPeer>> {
        let peer = Box::new(HttpPeer::new(self.addr, false, "".to_string()));
        Ok(peer)
    }
}
fn main() {
    env_logger::init();
    let opt = Opt::parse();
    let mut my_server = Server::new(Some(opt)).unwrap();
    my_server.bootstrap();
    let mut my_proxy = pingora_proxy::http_proxy_service(
        &my_server.configuration,
        Json2Yaml {
            // hardcode the IP of ip.jsontest.com for now
            addr: ("172.24.1.1", 80)
                .to_socket_addrs()
                .unwrap()
                .next()
                .unwrap(),
        },
    );
    my_proxy.add_tcp("0.0.0.0:6191");
    my_server.add_service(my_proxy);
    my_server.run_forever();
}

config:

---
version: 1
threads: 8

My testing was conducted on an Ubuntu system with 8 cores and 16 GB of MEM. Nginx started 8 worker processes.

1. Using wrk for testing:

wrk -t10 -c1000 -d30s http://172.24.1.2:6191

The result of Nginx:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   206.88ms  307.06ms   1.88s    81.37%
    Req/Sec     3.02k     1.07k    9.78k    74.21%
  903397 requests in 30.10s, 4.27GB read
  Socket errors: connect 0, read 0, write 0, timeout 748
Requests/sec:  30014.11
Transfer/sec:    145.21MB

The total CPU usage is around 50%, and the memory usage of each worker can be ignored.

The result of Pingora:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   180.33ms  288.71ms   1.81s    83.00%
    Req/Sec     2.99k     0.87k    5.78k    67.27%
  893573 requests in 30.02s, 4.22GB read
  Socket errors: connect 0, read 0, write 0, timeout 795
Requests/sec:  29766.67
Transfer/sec:    144.01MB

The total CPU usage is around 70%, and the memory usage increases by 0.3% after each test (0->0.9->1.2).

2. Using ab for testing:

ab -n 10000 -c 100 http://172.24.1.2:6191/

When I perform testing with ab, Pingora times out:

Benchmarking 172.24.19.185 (be patient)
apr_pollset_poll: The timeout specified has expired (70007)

The packet capture analysis is as follows:

截屏2024-09-02 17 56 02

It can be seen that a GET request was sent at the beginning, but Pingora did not return a response.

Nginx can be tested normally using the same command, and the packet capture shows that it responded properly.

截屏2024-09-02 18 01 18

ab is using HTTP/1.0, but after verification, this is not the cause of the problem.

Additionally, I also used Siege for testing, and the results were similar to those obtained with wrk.

3. Summary

Pingora is a remarkable project, and I’m very interested in its potential improvements over Nginx. However, I would like to know:

I really appreciate your support.

github2023spring commented 2 months ago

maybe, can you try to increase upstream_keepalive_pool_size to 1000? , and set tcp_keepalive in peer options?

SsuiyueL commented 2 months ago

maybe, can you try to increase upstream_keepalive_pool_size to 1000? , and set tcp_keepalive in peer options?

Thank you for your response! I seem to have discovered some issues:

Initially, my server was configured for short connections (with keepalive_timeout set to 0), and under those conditions, Pingora did not perform well. Later, I tested the server with long connections, and Pingora demonstrated its advantages. I also tested the configuration changes as you suggested. The detailed results are as follows:

Nginx test results are as follows:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   260.76ms  434.25ms   7.20s    84.93%
    Req/Sec     3.07k     1.20k    7.16k    73.84%
  909551 requests in 30.02s, 4.30GB read
Requests/sec:  30296.15
Transfer/sec:    146.75MB

cpu: 49%

The previous Pingora test results are as follows:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    98.75ms  190.03ms   3.43s    90.47%
    Req/Sec     4.95k     1.34k   11.83k    74.45%
  1475976 requests in 30.03s, 6.97GB read
Requests/sec:  49156.43
Transfer/sec:    237.83MB

cpu: 80%, In each test, the memory still increases irreversibly.

The improved Pingora test results are as follows:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    72.02ms  126.64ms   3.20s    88.49%
    Req/Sec     5.15k     1.39k   11.51k    73.82%
  1534099 requests in 30.10s, 7.25GB read
Requests/sec:  50968.27
Transfer/sec:    246.61MB

In summary, thanks for the response; it has resolved some of my issues. However, the memory increase and other problems still persist. I will continue to monitor this.

oddgrd commented 1 week ago

Hey! I've been trying to debug ever-increasing memory utilization in our Pingora proxy service (HTTP proxy with TLS and h2), which is described in this issue, and similarly in this: https://github.com/cloudflare/pingora/issues/447, which indicates other Pingora users have similar problems.

I can easily reproduce the issue with k6 load tests, and we can see that at the start of the test memory utilization increases quickly. Then, hours after the test, the memory utilization remains high. It keeps growing indefinitely until the service goes OOM, or until we restart it. In the below image you can see the load test run for 5 minutes at ̃20:00. This is on an AWS ECS Fargate service, with 0.5 vCPU and 1GB memory.

image

First I tried to see if we had written any memory leaks in our code, but if we do, I haven't been able to find it. I've tried using valgrind memcheck with leak detection, as well as valgrind massif for heap profiling.

Then I tried to figure out if there was some connection pool in Pingora that was ever-growing. The service is behind an AWS network load balancer, and we can see in its metrics that the downstream connections are not held open, so I don't believe that is the cause. I tried to disable the upstream connection pool as instructed here: https://github.com/cloudflare/pingora/blob/main/docs/user_guide/pooling.md, but the default size for that pool is 128, so it doesn't make sense that it would be ever-growing and enough to drive the service OOM. And after disabling it and re-running the test, it did not resolve the issue of ever-growing memory.

To summarize, I realize this is most likely an error on our end, since I know you run Pingora in production yourselves, and I assume you don't have this problem. However, perhaps you have seen this behavior before? Do you have any recommendations for what config I might tweak to resolve it? Any advice is highly appreciated, but I fully understand if you don't have time to help me with this. I'll tag you for visibility @drcaramelsyrup, apologies in advance!

If you have time to take a look, here is our setup code:

```rust pub fn start() { std::thread::spawn(|| { // don't drop the rt let rt = tokio::runtime::Runtime::new().unwrap(); rt.block_on(async move { setup_tracing(tracing_subscriber::registry()); info!("started tracing subscriber with otel exporter in tokio rt"); // keep this runtime running, so that the otel exporter keeps running std::future::pending::<()>().await; }); }); let args = Args::parse(); info!(args = ?args, "Starting proxy"); let mut server = Server::new(None).unwrap(); server.bootstrap(); // Attach the Prometheus service let mut prometheus_service = Service::prometheus_http_service(); let prometheus_address = format!("0.0.0.0:{}", args.metrics_port); info!("Serving prometheus metrics on address {prometheus_address}"); prometheus_service.add_tcp(&prometheus_address); server.add_service(prometheus_service); // XXX: is it fine to just have a runtime like that ? // It might mess up with the autoreload feature of Pingora, but I don't think we're going // to use that. let rt = tokio::runtime::Runtime::new().unwrap(); let aws_config = rt.block_on( aws_config::defaults(BehaviorVersion::latest()) .timeout_config( TimeoutConfig::builder() // Increase the connection timeout // See https://github.com/awslabs/aws-sdk-rust/issues/871#issuecomment-1690842996 .connect_timeout(Duration::from_secs(10)) .build(), ) .load(), ); let conn_opts = PgConnectOptions::new() .host(&args.db.host) .port(args.db.port) .username(&args.db.user) .password(&args.db.password) .database(&args.db.name); let pool = rt .block_on(sqlx::Pool::connect_with(conn_opts)) .expect("connect to postgres"); let db = Database::new(pool); // HTTPS server let tls_resolver = Box::new(TlsResolver::new( db.clone(), args.wildcard_fqdn, Arc::new(ChainAndPrivateKey::new( args.cert.wildcard_cert_full_chain, args.cert.wildcard_cert_private_key, )), Duration::from_secs(args.certificates_ttl_seconds), )); let host_resolver = CachingHostResolver::new( CloudmapHostResolver::new(ServiceDiscoveryClient::new(&aws_config), db), Duration::from_secs(args.resolver_ttl_seconds), ); let mut proxy = pingora::prelude::http_proxy_service( &server.configuration, EcsProxy::new(args.user_app_port, host_resolver, args.max_rps), ); let proxy_address = format!("0.0.0.0:{}", args.proxy_port); info!("Running proxy with TLS on address {proxy_address}"); let mut tls_settings = TlsSettings::with_callbacks(tls_resolver).unwrap(); tls_settings.enable_h2(); proxy.add_tls_with_settings(&proxy_address, None, tls_settings); server.add_service(proxy); server.run_forever(); } ```
ermakov-oleg commented 6 days ago

Try using tikv-jemallocator. It helped me reduce memory usage growth in cases involving a large number of new upstream connections. I think this improvement is related to reduced memory fragmentation.