apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.74k stars 782 forks source link

cache write lock not released when following redirect and no parent #9275

Open bdgranger opened 1 year ago

bdgranger commented 1 year ago

Configuration

The basic gist of what is happening is:

We tried setting proxy.config.http.redirect_use_orig_cache_key to 1, but it appears to make no difference.

We saw this when a customer upgraded from a system that had previously been using ATC 3.x which never generated empty parent.config files. If we restore the following default line to parent.config which actually has no parents in it, it appears to solve the issue:

dest_domain=. parent="" round_robin=consistent_hash go_direct=false qstring=ignore

The difference appears to be that in the case of empty parent.config or proxy routing enable set to 0, the 307 is treated via HandleCacheMiss and a new ISSUE_WRITE is performed which opens a new CacheVC on the default volume. When the default line is in parent.config, this code is executed instead:

else if (s->dns_info.lookup_name[0] <= '9' && s->dns_info.lookup_name[0] >= '0' && s->parent_params->parent_table->hostMatch &&
             !s->http_config_param->no_dns_forward_to_parent) {
    // note, broken logic: ACC fudges the OR stmt to always be true,
    // 'AuthHttpAdapter' should do the rev-dns if needed, not here .
    TRANSACT_RETURN(SM_ACTION_DNS_REVERSE_LOOKUP, HttpTransact::StartAccessControl);
  }

In this case there is a message from decideCacheLookup "will NOT do lookup" and ATS just reuses the CacheVC that was opened for the original lookup and actually writes to the ramdisk so that future requests for the object get the immediate cache hit in the ramdisk as expected. But this looks like it only worked because the redirect was to an IP address and not to another fqdn.

It seems either something isn't quite right with the redirect following or something that is supposed to free the first CacheVC got missed, leaving the write lock in place.

ywkaras commented 1 year ago

@bdgranger does the same problem exist in 9.1?

bdgranger commented 1 year ago

@bdgranger does the same problem exist in 9.1?

@ywkaras we will have to test this. Will get back to you. Looks like the OSDNSLookup() method has substantial changes since 8.1.x branch

ywkaras commented 1 year ago

@bdgranger any updates?

bdgranger commented 1 year ago

@ywkaras

Sorry it took me so long to notice this question. As of now, I have been tied up in other issues and have not been able to test with 9.x. We have worked around the issue for now by making sure that parent.config always has a default "dest_domain=. ..." line in it and there is then no problem.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.