apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.8k stars 797 forks source link

Crash HttpSM::setup_cache_write_transfer while handling redirect #11700

Open bneradt opened 1 month ago

bneradt commented 1 month ago

We're seeing a crash related to an interaction between cache and handling redirects:

#0  0x0000000000587c7d in HttpSM::setup_cache_write_transfer (name=0x8e1ab5 "cache write", skip_bytes=1395, store_info=0x7fa31d298180, source_vc=0x7fadc0aedc90, c_sm=0x7fa31d29a430, this=0x7fa31d298000)
    at /sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc:6770
#1  HttpSM::perform_cache_write_action (this=this@entry=0x7fa31d298000) at /sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc:6425
#2  0x000000000059d25e in HttpSM::handle_api_return (this=0x7fa31d298000) at /sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc:1630
#3  0x000000000059731e in HttpSM::state_api_callout (this=0x7fa31d298000, event=<optimized out>) at /sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc:1491
#4  0x000000000059e4a6 in HttpSM::state_api_callback (this=this@entry=0x7fa31d298000, event=event@entry=60000, data=data@entry=0x0)
    at /sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc:1295
#5  0x00007fadce570813 in TSHttpTxnReenable (txnp=0x7fa31d298000, event=TS_EVENT_HTTP_CONTINUE) at /sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/api/InkAPI.cc:4935
#6  0x00007fac5029cc76 in ?? ()
#7  0x00007fa31d298000 in ?? ()
#8  0x00007fa81fcac1c0 in ?? ()
#9  0x0000000000000000 in ?? ()

Some interesting variables @JosiahWI asked me to look at:

(gdb) p t_state.redirect_info.redirect_in_process
$1 = true

(gdb) f 1
(gdb) p t_state.cache_info.action
$2 = HttpTransact::CACHE_DO_WRITE

(gdb) p t_state.cache_info.write_lock_state
$1 = HttpTransact::CACHE_WL_SUCCESS

This is likely related to: https://github.com/apache/trafficserver/pull/11542

bryancall commented 1 month ago

cache_sm->cache_write_vc == nullptr

Screenshot 2024-08-15 at 5 06 26 PM
JosiahWI commented 1 month ago

I've studied the code and written an AuTest to attempt a reproduction (not successful yet).

Investigation

Reproduction Ideas

It's not particularly straightforward to reproduce given what we know so far. My first attempt was to force a redirection using the escalate plugin. This mirrors how we think the crash we observed starts. The escalate plugin overrides all the redirection configuration, and I wanted to make sure the timing of the state change on the plugin's hook wasn't the cause of the issue. There were no issues with this setup and it works exactly as expected.

My conclusion from all this is that something special needs to happen during the redirection, interfering with it and getting rid of the cache write VC while the redirection request or response is being processed. In the crash we're seeing this is very likely initiated by another plugin on one of the request hooks. I suspect this may also be timing dependent.

The only two places in the state machine where the write connection is directly set to nullptr are:

void
HttpSM::setup_cache_write_transfer(HttpCacheSM *c_sm, VConnection *source_vc, HTTPInfo *store_info, int64_t skip_bytes,
                                   const char *name)
{
  ink_assert(c_sm->cache_write_vc != nullptr);
  ink_assert(t_state.request_sent_time > 0);
  ink_assert(t_state.response_received_time > 0);

  store_info->request_sent_time_set(t_state.request_sent_time);
  store_info->response_received_time_set(t_state.response_received_time);

  c_sm->cache_write_vc->set_http_info(store_info);
  store_info->clear();

  tunnel.add_consumer(c_sm->cache_write_vc, source_vc, &HttpSM::tunnel_handler_cache_write, HT_CACHE_WRITE, name, skip_bytes);

  c_sm->cache_write_vc = nullptr;
}
  // in HttpSM::perform_cache_write_action
  case HttpTransact::CACHE_DO_WRITE:
  case HttpTransact::CACHE_DO_REPLACE:
    // Fix need to set up delete for after cache write has
    //   completed
    if (transform_info.entry == nullptr || t_state.api_info.cache_untransformed == true) {
      cache_sm.close_read();
      t_state.cache_info.write_status = HttpTransact::CACHE_WRITE_IN_PROGRESS;
      setup_cache_write_transfer(&cache_sm, server_entry->vc, &t_state.cache_info.object_store, client_response_hdr_bytes,
                                 "cache write");
    } else {
      // We are not caching the untransformed.  We might want to
      //  use the cache writevc to cache the transformed copy
      ink_assert(transform_cache_sm.cache_write_vc == nullptr);
      transform_cache_sm.cache_write_vc = cache_sm.cache_write_vc;
      cache_sm.cache_write_vc           = nullptr;
    }
    break;
bneradt commented 1 month ago

Attaching the gdb output of the HttpSM history variable:

history.txt