apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.82k stars 805 forks source link

ATS 10: uncacheable redirect is cached when number_of_redirections > 0 (most of the time) #11808

Open vuori opened 1 month ago

vuori commented 1 month ago

On ATS 10.0.0 and 10.0.1-rc0, it seems that when a remap rule has number_of_redirections set to a non-zero value, the redirect itself is sometimes cached even though it is not cacheable. The "sometimes" seems to be deterministic for a given set of configuration files, but after changing something unrelated (including comments in config files), the server may flip to a state where things work as expected (redirect is not cached).

redirect_use_orig_cache_key is unset and traffic_ctl shows that it is the default (0).

Steps to reproduce:

  1. Add a remap rule such as map http://localhost/ http://localhost/ @plugin=conf_remap.so @pparam=proxy.config.http.number_of_redirections=2
  2. Have a webserver generate a redirect to somewhere, for example with Apache: Redirect /aaa1 http://example.com/whatever
  3. Request the URL: curl -x http://localhost:8000 http://localhost/aaa1, note in HTTP server log that ATS fetches the redirect.
  4. Repeat the request, note that ATS (probably) doesn't fetch the redirect again. Though sometimes it will work correctly.
  5. (Bonus step: Try to fetch the redirect target manually through the cache, note that it gets a MISS.)

Complete debug log is included from the non-working state, I was not able to capture one from a working state. Some interesting parts:

Initial request (step 3) with the redirect fetched from origin:

[Oct  7 16:58:35.939] [ET_NET 0] DIAG: <HttpSM.cc:7789 (call_transact_and_set_next_state)> (http) [0] State Transition: SM_ACTION_API_
OS_DNS -> SM_ACTION_CACHE_ISSUE_WRITE
[Oct  7 16:58:35.939] [ET_NET 0] DIAG: <HttpSM.cc:5050 (do_cache_prepare_action)> (http_cache_write) [0] writing to cache with URL htt
p://localhost/aaa1
[Oct  7 16:58:35.939] [ET_NET 0] DIAG: <HttpCacheSM.cc:188 (state_cache_open_write)> (http_cache) [0] [&HttpCacheSM::state_cache_open_
write, CACHE_EVENT_OPEN_WRITE/TS_EVENT_CACHE_OPEN_WRITE]
[Oct  7 16:58:35.939] [ET_NET 0] DIAG: <HttpSM.cc:2608 (main_handler)> (http) [0] CACHE_EVENT_OPEN_WRITE/TS_EVENT_CACHE_OPEN_WRITE, 11
08
[Oct  7 16:58:35.939] [ET_NET 0] DIAG: <HttpSM.cc:2415 (state_cache_open_write)> (http) [0] [&HttpSM : state_cache_open_write, CACHE_E
VENT_OPEN_WRITE/TS_EVENT_CACHE_OPEN_WRITE]
[Oct  7 16:58:35.940] [ET_NET 0] DIAG: <HttpTransact.cc:3238 (handle_cache_write_lock)> (http_trans) Next action next; nullptr
[Oct  7 16:58:35.940] [ET_NET 0] DIAG: <HttpSM.cc:7789 (call_transact_and_set_next_state)> (http) [0] State Transition: SM_ACTION_CACH
E_ISSUE_WRITE -> SM_ACTION_ORIGIN_SERVER_OPEN
[Oct  7 16:58:35.940] [ET_NET 0] DIAG: <HttpTransact.h:969 (set_connect_fail)> (http) Setting upstream connection failure 0 to 5
[Oct  7 16:58:35.940] [ET_NET 0] DIAG: <HttpSM.cc:5266 (do_http_server_open)> (http_track) [0] entered inside do_http_server_open ][ip
v4]
[Oct  7 16:58:35.940] [ET_NET 0] DIAG: <HttpSM.cc:5293 (do_http_server_open)> (http) [0] open connection to localhost: 127.0.0.1:80
[Oct  7 16:58:35.940] [ET_NET 0] DIAG: <HttpSM.cc:5296 (do_http_server_open)> (http_seq) [0] Sending request to server
…
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpTransact.cc:4006 (handle_forward_server_connection_open)> (http_trans) [0] [hfsco] cache a
ction: CACHE_DO_WRITE
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpTransact.cc:4135 (handle_cache_operation_on_forward_server_response)> (http_trans) [0] (hc
oofsr)
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpTransact.cc:4136 (handle_cache_operation_on_forward_server_response)> (http_seq) [0] Enter
ing handle_cache_operation_on_forward_server_response
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpTransact.cc:6216 (is_response_cacheable)> (http_trans) [0] client permits storing
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpTransact.cc:6291 (is_response_cacheable)> (http_trans) [0] cache-control or expires header is required for 302
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpTransact.cc:4145 (handle_cache_operation_on_forward_server_response)> (http_trans) [0] [hcoofsr] response is not cacheable
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpTransact.cc:4296 (handle_cache_operation_on_forward_server_response)> (http_trans) [0] [hcoofsr] response code: 302

Redirect is followed and the target is written:

Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpSM.cc:7789 (call_transact_and_set_next_state)> (http) [0] State Transition: SM_ACTION_ORIG
IN_SERVER_OPEN -> SM_ACTION_SERVER_READ
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpSM.cc:8557 (is_redirect_required)> (http_redirect) [0] redirect_required: 1
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpSM.cc:8209 (do_redirect)> (http_redirect) [0] enable_redirection 1
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpSM.cc:8557 (is_redirect_required)> (http_redirect) [0] redirect_required: 1
[Oct  7 16:58:35.942] [ET_NET 0] DIAG: <HttpSM.cc:8275 (redirect_request)> (http_redirect) [0] redirect url: http://XXX.s3.amazonaws.c
om/aaa-assets/71321fb9c8ddca4edd3f3a7a3ac55c08eac6d0a201efae369d1eb3fa0fe96fe7/assets/XXX_assets/customer_benefits_page.html
…
[Oct  7 16:58:35.974] [ET_NET 0] DIAG: <HttpSM.cc:7789 (call_transact_and_set_next_state)> (http) [0] State Transition: SM_ACTION_API_
OS_DNS -> SM_ACTION_CACHE_ISSUE_WRITE
[Oct  7 16:58:35.974] [ET_NET 0] DIAG: <HttpSM.cc:5050 (do_cache_prepare_action)> (http_cache_write) [0] writing to cache with URL htt
p://XXX.s3.amazonaws.com/aaa-assets/71321fb9c8ddca4edd3f3a7a3ac55c08eac6d0a201efae369d1eb3fa0fe96fe7/assets/XXX_assets/customer_benefi
ts_page.html
[Oct  7 16:58:35.974] [ET_NET 0] DIAG: <HttpSM.cc:2608 (main_handler)> (http) [0] CACHE_EVENT_OPEN_WRITE_FAILED/TS_EVENT_CACHE_OPEN_WR
ITE_FAILED, 1109
[Oct  7 16:58:35.974] [ET_NET 0] DIAG: <HttpSM.cc:2415 (state_cache_open_write)> (http) [0] [&HttpSM : state_cache_open_write, CACHE_E
VENT_OPEN_WRITE_FAILED/TS_EVENT_CACHE_OPEN_WRITE_FAILED]
[Oct  7 16:58:35.974] [ET_NET 0] DIAG: <HttpSM.cc:2445 (state_cache_open_write)> (http_redirect) [0] CACHE_EVENT_OPEN_WRITE_FAILED dur
ing redirect follow
[Oct  7 16:58:35.974] [ET_NET 0] DIAG: <HttpTransact.cc:3238 (handle_cache_write_lock)> (http_trans) Next action next; nullptr
[Oct  7 16:58:35.974] [ET_NET 0] DIAG: <HttpSM.cc:7789 (call_transact_and_set_next_state)> (http) [0] State Transition: SM_ACTION_CACH
E_ISSUE_WRITE -> SM_ACTION_ORIGIN_SERVER_OPEN
…
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpSM.cc:4567 (do_hostdb_update_if_necessary)> (http) [0] server info = 52.95.170.56:80
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:3991 (handle_forward_server_connection_open)> (http_trans) [0] [hfsco] redirect in progress, non-3xx response, setting cache_do_write
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:4006 (handle_forward_server_connection_open)> (http_trans) [0] [hfsco] cache action: CACHE_DO_WRITE
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:4135 (handle_cache_operation_on_forward_server_response)> (http_trans) [0] (hcoofsr)
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:4136 (handle_cache_operation_on_forward_server_response)> (http_seq) [0] Entering handle_cache_operation_on_forward_server_response
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:6216 (is_response_cacheable)> (http_trans) [0] client permits storing
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:6267 (is_response_cacheable)> (http_trans) [0] YES by response cache control
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:4145 (handle_cache_operation_on_forward_server_response)> (http_trans) [0] [hcoofsr] response is cacheable
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:4296 (handle_cache_operation_on_forward_server_response)> (http_trans) [0] [hcoofsr] response code: 200
[Oct  7 16:58:36.013] [ET_NET 0] DIAG: <HttpTransact.cc:4528 (handle_cache_operation_on_forward_server_response)> (http_trans) [0] [hcoofsr] cache write
+++++++++ Cached Request Hdr +++++++++
-- State Machine Id: 0
GET http://XXX.s3.amazonaws.com/aaa-assets/71321fb9c8ddca4edd3f3a7a3ac55c08eac6d0a201efae369d1eb3fa0fe96fe7/assets/XXX_assets/customer_benefits_page.html HTTP/1.1

Second request (step 4):

[Oct  7 16:58:38.364] [ET_NET 1] DIAG: <HttpSM.cc:7789 (call_transact_and_set_next_state)> (http) [1] State Transition: SM_ACTION_API_POST_REMAP -> SM_ACTION_CACHE_LOOKUP
[Oct  7 16:58:38.364] [ET_NET 1] DIAG: <HttpSM.cc:4946 (do_cache_lookup_and_read)> (http_seq) [1] Issuing cache lookup for URL http://localhost/aaa1
[Oct  7 16:58:38.364] [ET_NET 1] DIAG: <HttpTransactCache.cc:195 (SelectFromAlternates)> (http_match) [SelectFromAlternates] # alternates = 1
[Oct  7 16:58:38.364] [ET_NET 1] DIAG: <HttpTransactCache.cc:196 (SelectFromAlternates)> (http_seq) [SelectFromAlternates] 1 alternates for this cached doc
[alts] There are 1 alternates for this request header.
…
[alts] ---- alternate #1 (Q = 1) has these request/response hdrs:
GET http://XXX.s3.amazonaws.com/aaa-assets/71321fb9c8ddca4edd3f3a7a3ac55c08eac6d0a201efae369d1eb3fa0fe96fe7/assets/XXX_assets/customer_benefits_page.html HTTP/1.1
…
[Oct  7 16:58:38.364] [ET_NET 1] DIAG: <HttpTransact.cc:2412 (HandleCacheOpenRead)> (http_trans) [1] CacheOpenRead -- hit
[Oct  7 16:58:38.364] [ET_NET 1] DIAG: <HttpTransact.cc:2413 (HandleCacheOpenRead)> (http_trans) Next action SM_ACTION_API_READ_CACHE_HDR; HandleCacheOpenReadHitFreshness
[Oct  7 16:58:38.364] [ET_NET 1] DIAG: <HttpSM.cc:7789 (call_transact_and_set_next_state)> (http) [1] State Transition: SM_ACTION_CACHE_LOOKUP -> SM_ACTION_API_READ_CACHE_HDR

So somehow the redirect target got cached at the location of the redirect. The target has a Cache-Control that allows caching, nothing relevant to this target is set in cache.config. The same thing happens regardless of whether any global plugins are enabled. This 10.0.0 was built with cmake defaults on Ubuntu 22.04, though the same thing happens when 10.0.1-rc0 is built on Debian Bookworm.

Any suggestions? I looked at the relevant code, but I'm not familiar with the internals of the HTTP state machine and couldn't see anything obviously wrong.

log.txt

bryancall commented 1 month ago

We might have a fix for this issue on a PR that hasn't been merged yet. @elsloo is going to take a look at it.

bryancall commented 4 weeks ago

@cmcfarlen Was able to reproduce the issue on ATS 10.0.0. @JosiahWI thinks it might be related to PR #11542. @cmcfarlen is going to test 9.2.x and see if he can reproduce the problem on that branch.

cmcfarlen commented 4 weeks ago

PR #11706 reverts #11542 for the 10.0.x branch, so this PR is likely not the issue