crowell / modpagespeed_tmp

Automatically exported from code.google.com/p/modpagespeed
Apache License 2.0
0 stars 0 forks source link

mod_pagespeed strips custom HTML headers #367

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
A user has this in his .conf file:
   Header set ServerID  "XXXX"
he reports that the ServerID is stripped by mod_pagespeed on serving.  Looking 
at the code I think this is correct.

in apache/mod_instaweb.cc, function rewrite_html, we have this block of code:

  if (!context->sent_headers()) {
    ResponseHeaders* headers = context->response_headers();
    apr_table_clear(request->headers_out);
    AddResponseHeadersToRequest(*headers, request);
    headers->Clear();
    context->set_sent_headers(true);
  }

That call to apr_table_clear is the culprit.  I think instead we need to 
rewrite the existing headers, replacing only the ones that we need to override.

Original issue reported on code.google.com by jmara...@google.com on 4 Jan 2012 at 3:57

GoogleCodeExporter commented 9 years ago
I have never used Tomcat.  Could you possibly post the Apache config needed to 
connect to Tomcat and whatever files you need to repro the problem in Tomcat.  
Finally (seems like a long list I know...) some kind of 'wget' or 'curl' based 
query to the server showing the response-headers you are complaining about?

Naturally we do have tests that make us think 'ModifyCachingHeaders off' works, 
but it evidently doesn't prove that it works for *your* configuration.

Thanks!

Original comment by jmara...@google.com on 16 Nov 2012 at 1:23

GoogleCodeExporter commented 9 years ago
I gave 1.1.23.1 a try and MPS seems to preserve all headers!

MPS on and ModifyCachingHeaders on: curl shows that filters work on html and 
headers remain intact
MPS on and ModifyCachingHeaders off: curl shows that filters do NOT work on 
html and 
 headers remain intact
MPS off and ModifyCachingHeaders on: curl shows filters do not work and headers 
remain intact
MPS off and ModifyCachingHeaders on: curl shows filters do not work and headers 
remain intact

Tomorrow I will do some further testing but so far, it seems that the headers 
always remain intact.

Original comment by alexel...@gmail.com on 16 Nov 2012 at 3:51

GoogleCodeExporter commented 9 years ago
matt@favoritemo.com it would be great if you could try 1.1.23.2 a try (a minor 
patch (.2) was just released today, mainly to quiet down some noisy logging).

matterbury@ made a specific change made since the previous functional release 
(http://code.google.com/p/modpagespeed/source/detail?r=1816), that addressed a 
related issue (Issue 385).  1.1.23.1 was the first release incorporating that 
fix.

We were attempting to get to the bottom of this bug before 1.1.23.1 was frozen, 
but could not repro so we're still looking for more detail.  If you can repro 
on that release we'd love to hear more detail about the setup (conf files, 
other modules, etc).

Original comment by jmara...@google.com on 16 Nov 2012 at 4:14

GoogleCodeExporter commented 9 years ago
First some background on our setup: we have apache sitting in front of 
JBoss/Tomcat servers, proxying requests to them. Our intention is to have 
apache both do MPS magic as well as set content security policy (CSP) headers 
on responses. The headers in question I've been losing are the CSP ones, 
defined very simply and statically in the apache configuration to be set 
always...except if MPS gets a hold of a proxied GET request, they didn't appear 
there.

Anyway, I think I've found something interesting. I tried building from source 
and rigging up a test case whereby apache was MPS-processing requests to itself 
(using either ProxyPass or rewrite rules) and having no luck reproducing the 
problem -- all of my post-MPS response headers were coming back intact. I then 
took that built module and installed it on one of our test servers; no joy, 
same behavior wherein apache refuses to set response headers after MPS is 
triggered. There was only one remaining difference: instead of proxying over 
HTTP, we are configured to proxy over AJP.

Once I configured the test server to proxy to Tomcat via HTTP instead of AJP, 
the apache-set response headers started to appear.

I don't know enough about the apache proxying code, the behavior of AJP, or 
whatever other subtle differences exist between AJP responses and HTTP 
responses to know what to do next, exactly, but I did at least figure out the 
cause of what I've been seeing. If anyone has some pointers in the right 
direction, I'd love to hear them.

Original comment by m...@favoritemo.com on 20 Nov 2012 at 11:47

GoogleCodeExporter commented 9 years ago
I also just now tried commenting out the call to 
DisableDownstreamHeaderFilters, just to see if it made a difference -- it did 
not. Whatever's stopping those headers from being set is not there.

Original comment by m...@favoritemo.com on 21 Nov 2012 at 12:20

GoogleCodeExporter commented 9 years ago
Thanks for sticking with this Matt.  Once we can repro we can easily fix.  
There are a few different theories on what might be broken, one of which you 
have just eliminated with your experiment; I even have some code to avoid 
calling DisableDownstreamHeaderFilters in the HTML flow, but it does not break 
or fix any testcase I have so I haven't proposed submitting that.  Commenting 
out DisableDownstreamHeaderFilters of course definitely breaks some of our 
tests that make sure we cache .pagespeed. resources for a year.

Another theory is that the timing of the arrival of the header is causing us 
trouble.  That was the problem with Issue 385 -- we had captured the original 
headers in our own data structure net_instaweb::ResponseHeaders before they 
were set by PHP, so when we wrote the final headers back with our caching 
updates we lost the ones PHP had tried to set.  This was fixed by delaying the 
strobe for the upstream headers.  An alternative fix might be a little more 
robust, which is to avoid copying over the headers, and instead merge them back.

Yet another theory is that some of the upstream headers might be arriving in 
request->headers_out, while others are arriving in request->err_headers_out.  
Our system might be interfering with the Apache logic.

Do you know if the JbBoss/Tomcat interface is open-source?  If so we can look 
at the source & try to figure out how it's transmitting headers, especially via 
AJP (those initials mean nothing to me by the way...)

Original comment by jmara...@google.com on 21 Nov 2012 at 2:58

GoogleCodeExporter commented 9 years ago
The AJP interface is definitely open source. Some lazy googling yields

https://bitbucket.org/mirror/apache-http-server/src/eb7db2fa8607/modules/proxy/m
od_proxy_ajp.c?at=trunk

Which has a possibly interesting comment about the protocol's lack of a 
specific flush message requires some shenanigans with polling. I had a theory 
that the issue might have something to do with the buckets coming back from the 
AJP proxy in a way that was violating some assumption that MPS was making, but 
haven't investigated that myself.

This is close to an academic problem for us, as we can probably just switch to 
proxying via HTTP instead (among many other ways to get a header added to the 
response post-MPS), but I'm willing to help you guys chase it down.

Original comment by m...@favoritemo.com on 21 Nov 2012 at 7:44

GoogleCodeExporter commented 9 years ago
Now, after having looked at it for a while longer, I think I might have not 
been correct about DisableDownstreamHeaderFilters not fixing anything. I've 
added a bunch of logging to try to figure out what's going on. While proxying 
via HTTP and via AJP, I see that DDHF is being called due to Cache-Control 
being set by the application server, but via HTTP the filters it ends up 
iterating through look like this

looking at filter mod_pagespeed_output_filter
looking at filter deflate
looking at filter content_length
looking at filter http_outerror
looking at filter chunk
looking at filter log_input_output
looking at filter core

While using AJP it looks like this:

looking at filter mod_pagespeed_output_filter
looking at filter fixup_headers_out
looking at filter deflate
looking at filter deflate
looking at filter mod_pagespeed_fix_headers_filter
looking at filter byterange
looking at filter content_length
looking at filter http_header
looking at filter http_outerror
looking at filter log_input_output
looking at filter core

Strangely, of course, fixup_headers_out appears to have already happened (?!) 
in the HTTP proxying setup (which is why the custom headers end up appearing in 
those responses) but not yet in the AJP proxying setup (and since DDHF is 
happening they don't appear). Bizarre, not to mention I can't explain why 
"deflate" is in there twice and several others only appear in the AJP proxying 
setup. The only difference between the two is that in HTTP mode I have the line 
"ProxyPass /app-path http://localhost:8080/app-path" and in AJP mode "ProxyPass 
/app-path ajp://localhost:8009/app-path".

Really confusing.

Original comment by m...@favoritemo.com on 23 Nov 2012 at 11:20

GoogleCodeExporter commented 9 years ago

Original comment by jkar...@google.com on 26 Nov 2012 at 2:22

GoogleCodeExporter commented 9 years ago
Bouncing back to Matt

Original comment by jmara...@google.com on 28 Nov 2012 at 7:13

GoogleCodeExporter commented 9 years ago
AJP vs. HTTP weirdness aside, I still stand by my original assertion: if I want 
to explicitly set headers downstream from MPS, I should be allowed to do so. 
Even if they happen to screw up headers that MPS is itself trying to set, but 
especially if they don't. DisableDownstreamHeaderFilters, as currently 
implemented and invoked, is a poor solution to a "doctor, it hurts when I do 
this" problem.

Original comment by m...@favoritemo.com on 29 Nov 2012 at 7:29

GoogleCodeExporter commented 9 years ago
We agree with you 100% with your desire to set your own headers, and we have 
unit tests that -- at least for our setup -- prove that you can.

DisableDownstreamHeaderFilters is probably applied too aggressively, and in 
fact I have a change-list that tries to narrow its application to the case we 
need, which is to avoid having the origin Expires/Cache-Control headers applied 
to resources that we are cache-extending.

However I have been unable to come up with a testcase that this change fixes.  
If you are able to build from source I could attach a patch-file for you to try 
to let me know if it fixes your problem.  Would that be feasible?

Original comment by jmara...@google.com on 29 Nov 2012 at 8:34

GoogleCodeExporter commented 9 years ago
I'm not sure how to help you construct a test case that doesn't also depend on 
the...unusual ordering of the filters as described in comment 58. The key to 
triggering the DisableDownstreamHeaderFilters behavior is having whatever's 
upstream of MPS set Cache-Control to something. It appears that anything doing 
HTTP proxying won't reproduce it because fixup_headers_out is happening before 
MPS, but if the test harness has another way to produce a resource for MPS to 
process that doesn't also lead to fixup_headers_out happening before MPS gets 
its chance then that would be a promising angle.

I can definitely build from source and would love a patch. In fact I have 
patched out the DDHF behavior myself, but I don't want to maintain a fork...

Original comment by m...@favoritemo.com on 29 Nov 2012 at 11:58

GoogleCodeExporter commented 9 years ago
I was going to send you one today but then... 
http://bostinno.com/2012/11/29/cambridge-power-outage-mbta-signal-problems/#ss__
265464_1_0__ss

Just to confirm, you said commenting out DDHF did not help in comment 55 but 
then you found in comment 58 it did.  So if I prevent that function from being 
called in the HTML path (where I think we don't need it) that should solve the 
problem for you while still allowing us to extend cache-lifetime of resources.

Also to confirm you are talking purely about HTML and do not have a problem 
with our handling of resources.  Right?

I'll try to get a patch to you as soon as practical (e.g. somebody starts 
injecting enough voltage into my disk drive to make it spin).

Original comment by jmara...@google.com on 30 Nov 2012 at 12:12

GoogleCodeExporter commented 9 years ago
OK here's a candidate patch, taken from trunk @ r2230

It seems to pass basic unit & system tests.  Full suite isn't done yet. 

Matt please let me know if this resolves the issue.

Original comment by jmara...@google.com on 30 Nov 2012 at 4:05

Attachments:

GoogleCodeExporter commented 9 years ago
We are about to freeze a new release so if you can confirm this fixes the 
problem, I'll put it in.

Original comment by jmara...@google.com on 30 Nov 2012 at 2:52

GoogleCodeExporter commented 9 years ago
I haven't had a chance to build this and run through my tests today, but
looking at the diff reveals promise. It's definitely in the right area of
the code and looks like it has the right spirit. I didn't do any particular
analysis of not-HTML payloads (which is what I assume you are referring to
as "resources" here). I might have time to kick the tires on this tomorrow.

-m

Original comment by m...@favoritemo.com on 1 Dec 2012 at 12:32

GoogleCodeExporter commented 9 years ago
That patch does the trick, assuming ModPagespeedModifyCachingHeaders is set to 
"off". Without it, my AJP-proxied GETs have no mod_headers-set output headers; 
with it, DDHF is not called so the headers are set correctly.

Original comment by m...@favoritemo.com on 3 Dec 2012 at 6:00

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r2254.

Original comment by jmara...@google.com on 4 Dec 2012 at 4:09