cloudflare / pingora

A library for building fast, reliable and evolvable network services.
Apache License 2.0
22.2k stars 1.23k forks source link

Bug: deadloop in proxy_handle_downstream #475

Open taikulawo opened 1 day ago

taikulawo commented 1 day ago

Describe the bug

We receive many waiting for permit log, CPU go to 100%

2024-11-18T18:59:19.240967 DEBUG tcp{port=443}:http{trace_id="20241118185918x173192755855585895a" inbound="http"}: waiting for permit Err(Error { error_type: InternalError, error_source: Unset, inner: Some("Closed(..)"), help: Some("try_reserve() body pipe for upstream"), retry: Decided(false) })    
2024-11-18T18:59:19.240975 DEBUG tcp{port=443}:http{trace_id="20241118185918x173192755855585895a" inbound="http"}: waiting for permit Err(Error { error_type: InternalError, error_source: Unset, inner: Some("Closed(..)"), help: Some("try_reserve() body pipe for upstream"), retry: Decided(false) })    

Pingora info

We forked pingora code into our own codebase

Pingora version: release number of commit hash Rust version: i.e. cargo 1.82.0 (8f40fc59f 2024-08-21) Operating system version: Centos 8

Steps to reproduce

first, pingora pipe upstream/downstream, it's OK.

let ret = tokio::try_join!(
            self.proxy_handle_downstream(session, tx_downstream, rx_upstream, ctx),
            self.proxy_handle_upstream(client_session, tx_upstream, rx_downstream),
        );

But if proxy_handle_upstream exit early than proxy_handle_downstream, rx will be droped.

  1. https://github.com/cloudflare/pingora/blob/bdb13a7e40b19280d703ad1f64abc29ee5baf480/pingora-proxy/src/proxy_h1.rs#L203

In proxy_handle_downstream, we get permit first.

  let send_permit = tx
      .try_reserve()
      .or_err(InternalError, "try_reserve() body pipe for upstream");

But since rx already been droped. send_permit will be Closed, not Full

// https://docs.rs/tokio/latest/tokio/sync/mpsc/error/enum.TrySendError.html
pub enum TrySendError<T> {
    Full(T),
    Closed(T),
}

send_permit.is_ok() will already be Error. It will NEVER been polled again!

  body = session.downstream_session.read_body_or_idle(downstream_state.is_done()),
                    if downstream_state.can_poll() && send_permit.is_ok() => {
}

tx.reserve() will always return Error since it be closed,

 _ = tx.reserve(), if downstream_state.is_reading() && send_permit.is_err() => {
      debug!("waiting for permit {send_permit:?}");
      /* No permit, wait on more capacity to avoid starving.
       * Otherwise this select only blocks on rx, which might send no data
       * before the entire body is uploaded.
       * once more capacity arrives we just loop back
       */
  },

finally, forward deadloop

Expected results

works corrently

Observed results

Our server deadloop

Additional context

image
taikulawo commented 1 day ago

pingora send body to proxy_handle_upstream even if body is empty, which send upstream_end_of_body == true, causing proxy_handle_upstream exit early

// retry, send buffer if it exists or body empty
// false || true
if buffer.is_some() || session.as_mut().is_body_empty() {
    let send_permit = tx
        .reserve()
        .await
        .or_err(InternalError, "reserving body pipe")?;
    let end_of_body = self.send_body_to_pipe(
        session,
        buffer,
        downstream_state.is_done(),
        send_permit,
        ctx,
    )
    .await?;
   // ++++ we should finish body if return value is true
    downstream_state.maybe_finished(end_of_body);
}
Inffenex commented 20 hours ago

// retry, send buffer if it exists or body empty // false || true if buffer.is_some() || session.as_mut().is_body_empty() { let send_permit = tx .reserve() .await .or_err(InternalError, "reserving body pipe")?; let end_of_body = self.send_body_to_pipe( session, buffer, downstream_state.is_done(), send_permit, ctx, ) .await?; // ++++ should we finish body if return value is true ? downstream_state.maybe_finished(end_of_body); }