Open bneradt opened 1 month ago
cache_sm->cache_write_vc == nullptr
I've studied the code and written an AuTest to attempt a reproduction (not successful yet).
CACHE_WL_SUCCESS
state unless that happens, and that is the precondition for deciding to do a write instead of open a new connection when a redirected request gets a response.cache_write_vc
to nullptr
because we moved ownership to another object, and one where we initialize it and it could become nullptr
if that's what it was initialized with.It's not particularly straightforward to reproduce given what we know so far. My first attempt was to force a redirection using the escalate plugin. This mirrors how we think the crash we observed starts. The escalate plugin overrides all the redirection configuration, and I wanted to make sure the timing of the state change on the plugin's hook wasn't the cause of the issue. There were no issues with this setup and it works exactly as expected.
My conclusion from all this is that something special needs to happen during the redirection, interfering with it and getting rid of the cache write VC while the redirection request or response is being processed. In the crash we're seeing this is very likely initiated by another plugin on one of the request hooks. I suspect this may also be timing dependent.
The only two places in the state machine where the write connection is directly set to nullptr
are:
void
HttpSM::setup_cache_write_transfer(HttpCacheSM *c_sm, VConnection *source_vc, HTTPInfo *store_info, int64_t skip_bytes,
const char *name)
{
ink_assert(c_sm->cache_write_vc != nullptr);
ink_assert(t_state.request_sent_time > 0);
ink_assert(t_state.response_received_time > 0);
store_info->request_sent_time_set(t_state.request_sent_time);
store_info->response_received_time_set(t_state.response_received_time);
c_sm->cache_write_vc->set_http_info(store_info);
store_info->clear();
tunnel.add_consumer(c_sm->cache_write_vc, source_vc, &HttpSM::tunnel_handler_cache_write, HT_CACHE_WRITE, name, skip_bytes);
c_sm->cache_write_vc = nullptr;
}
// in HttpSM::perform_cache_write_action
case HttpTransact::CACHE_DO_WRITE:
case HttpTransact::CACHE_DO_REPLACE:
// Fix need to set up delete for after cache write has
// completed
if (transform_info.entry == nullptr || t_state.api_info.cache_untransformed == true) {
cache_sm.close_read();
t_state.cache_info.write_status = HttpTransact::CACHE_WRITE_IN_PROGRESS;
setup_cache_write_transfer(&cache_sm, server_entry->vc, &t_state.cache_info.object_store, client_response_hdr_bytes,
"cache write");
} else {
// We are not caching the untransformed. We might want to
// use the cache writevc to cache the transformed copy
ink_assert(transform_cache_sm.cache_write_vc == nullptr);
transform_cache_sm.cache_write_vc = cache_sm.cache_write_vc;
cache_sm.cache_write_vc = nullptr;
}
break;
Attaching the gdb output of the HttpSM
history variable:
We're seeing a crash related to an interaction between cache and handling redirects:
Some interesting variables @JosiahWI asked me to look at:
This is likely related to: https://github.com/apache/trafficserver/pull/11542