apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.81k stars 799 forks source link

body_factory performance, configuration and errors #8304

Open c-taylor opened 3 years ago

c-taylor commented 3 years ago

(Tests performed on 9.0.x)

body_factory has several performance and correctness issues when compared to serving similar objects from cache. For example serving a 301 with body is 2.6x - 5.5x faster from cache than from body_factory.

This is inverted from my expectation, where I would perceive small, fabricated responses to be as fast or faster than from cache.

Config example

map https://example.com \
    https://www.example.com \
    @plugin=regex_remap.so @pparam=redirect.config          

redirect.config:
(.*) https://www.example.com$0 @status=301

Observed Issues

I completed several load test and traces and observed the following:

https://github.com/apache/trafficserver/blob/6d4f919b733d47d4e0afc5243e6d863853c38bb6/proxy/http/HttpBodyFactory.cc#L86

// The body factory can be reconfigured dynamically by a manager    //
// callback, so locking is required.  The callback takes a lock,    //
// and the user entry points take a lock.  These locks may limit    //
// the speed of error page generation.  

When performance testing body factory in default mode, the first thing that you notice is the explicit lock/mutex in a perf trace, you can see threads contending on this lock rather than completing useful work.

After 'suppressing' responses, you do see the lock contention disappear, however you run into the second point above. body_factory uses ats_malloc seemingly on a per request basis and so the overall performance is still significantly lower than a cached object, in fact lower than when you don't use suppression in my case! This latter rps drop might also be related to client behaviour when receiving and empty response from the server rather than the expected (in my case) 301. Notably the 301 still appears as expected in the log, but the client never sees it on the wire. See the below curl output...

Default:

> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/7.70.0
> Accept: */*
>
< HTTP/1.1 301 Redirect
< Date: Thu, 02 Sep 2021 10:07:30 GMT
< Connection: keep-alive
< Via: http/1.1 server.example.com (ApacheTrafficServer/9.0.3)
< Server: ATS/9.0.3
< Cache-Control: no-store
< Location: https://www.example.com/
< Content-Type: text/html
< Content-Language: en
< Content-Length: 304
<
<HTML>
<HEAD>
<TITLE>Document Has Moved</TITLE>
</HEAD>

<BODY BGCOLOR="white" FGCOLOR="black">
<H1>Document Has Moved</H1>
<HR>

<FONT FACE="Helvetica,Arial"><B>
Description: The document you requested has moved to a new location.  The new location is "https://www.example.com/".
</B></FONT>
<HR>
</BODY>
* Connection #0 to host example.com left intact

Suppressed: proxy.config.body_factory.response_suppression_mode INT 1

> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/7.70.0
> Accept: */*
>
* TLSv1.3 (IN), TLS alert, close notify (256):
* Empty reply from server

Test details

Cache: 195,000 rps body_factory: 74,000 rps body_factory (suppression): 35,000 rps <<-- empty responses

Wants

I want:

SolidWallOfCode commented 3 years ago

@zwoop has been complaining about this for years. The internals are terrible, even for ATS. I've done some work on this but hit a blocker I haven't had time to get through. I think I have a plan now, but not sure when I can find the time to do the implementation.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.