Open GoogleCodeExporter opened 9 years ago
mod_rewrite tends to screw up mod_pagespeed unless you're very careful (and
often they just can't work together even if you are careful :-).
That said, could you please check your Apache logs for any mod_pagespeed
messages about the images that start with /prefix?
Original comment by matterb...@google.com
on 11 Nov 2013 at 1:15
hm. i need apache to proxy requests to the backend server, so i better go with
mod_proxy?
however, i compiled from source so i have the latest(?) version now, same issues
for whatever reason mod_pagespeed thinks this stuff is not cacheable.
i played around by setting lastmodified, expired and cache-control headers but
no luck. is there any way to find out why mod_pagespeed thinks this stuff is
not cacheable?
i set DebugLevel to info, this is what shows up in the apache error.log:
[Sun Nov 10 21:12:42 2013] [info] [mod_pagespeed 1.7.0.0-3616 @22674] HTTPCache
key=http://musicasacra.lemon42.com/DE/repos/evoscripts/musica_sacra/returnBinary
Image/2/konzert/Bild_teaser.jpg: remembering not-cacheable status for 298
seconds.
Thanks!
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 4:36
ahh.. thats weird. now i'm using mod_proxy instead of mod_rewrite -
and pagespeed says "remembering not-found status" for an image thats actually
there
- I can see it :)
[Mon Nov 11 17:10:41 2013] [info] [mod_pagespeed 1.7.0.0-3616 @23956] HTTPCache
key=http://musicasacra.lemon42.com/cms/web/DE/mode/work/repos/evoscripts/musica_
sacra/returnBinaryImage/31/kuenstler/Foto: remembering not-found status for 259
seconds.
[Mon Nov 11 17:10:59 2013] [info] [mod_pagespeed 1.7.0.0-3616 @23958] HTTPCache
key=http://musicasacra.lemon42.com/cms/web/DE/mode/work/repos/evoscripts/musica_
sacra/returnBinaryImage/31/kuenstler/Foto: remembering not-found status for 241
seconds.
[Mon Nov 11 17:11:05 2013] [info] [mod_pagespeed 1.7.0.0-3616 @23979] HTTPCache
key=http://musicasacra.lemon42.com/cms/web/DE/mode/work/repos/evoscripts/musica_
sacra/returnBinaryImage/31/kuenstler/Foto: remembering not-found status for 235
seconds.
but then again it has an expired cache entry?
[Mon Nov 11 17:15:12 2013] [info] [mod_pagespeed 1.7.0.0-3616 @23970] Cache
entry is expired:
http://musicasacra.lemon42.com/cms/web/DE/mode/work/repos/evoscripts/musica_sacr
a/returnBinaryImage/31/kuenstler/Foto
The image comes with these headers:
Cache-Control:max-age=600
Connection:close
Content-Length:10280
Content-Type:image/jpeg
Date:Mon, 11 Nov 2013 17:17:46 GMT
Last-Modified:Sun, 10 Nov 2013 17:17:46 GMT
Server:EvoWebBase/2.0
X-Extra-Header:1
I'm confused...
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 5:18
The issue is (probably) that mod_pagespeed isn't using the right URL because it
has been rewritten. The interaction between mod_pagespeed and mod_rewrite is
explained here:
https://code.google.com/p/modpagespeed/issues/detail?id=676
Were there no other messages in the log about mod_pagespeed not being able to
fetch the original resource?
Original comment by matterb...@google.com
on 11 Nov 2013 at 5:52
i don't think so but i can check again - what should i look out for?
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 6:04
hm and btw why is my comment regarding mod_proxy deleted?
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 6:05
Re messages: anything that mentioned mod_pagespeed and the URL (such as
Bild_teaser.jpg).
Re deleted comment, isn't it #3?
We almost delete comments and there's no evidence of that having been done here.
Original comment by matterb...@google.com
on 11 Nov 2013 at 6:54
hm strange. i see it as deleted, maybe i somehow managed to delete it myself :)
I've attached a log file i got from using:
more /var/log/httpd/error_log | grep Bild_teaser.jpg > teaser.log
Do you thinks its better to use mod_proxy? or are there issues too?
I read the interaction you posted on
https://code.google.com/p/modpagespeed/issues/detail?id=676 but I think I didnt
really get the gist here.
If I use
> more /etc/httpd/conf.d/pagespeed.conf | grep Location
i get the ouput below. should I include the <IfModule mod_rewrite.c>
RewriteEngine Off </IfModule> in these locations?
<Location /mod_pagespeed_statistics>
</Location>
<Location /pagespeed_console>
</Location>
<Location /mod_pagespeed_message>
</Location>
ModPagespeedDownstreamCachePurgeLocationPrefix "http://localhost:8020"
<Location /mod_pagespeed_log_request_headers.js>
</Location>
<Location ~ "/mod_pagespeed_test/response_headers.html*">
</Location>
<Location /mod_pagespeed_global_statistics>
</Location>
<Location /mod_pagespeed_beacon>
</Location>
<Location /mod_pagespeed_beacon>
</Location>
<Location /mod_pagespeed_temp_statistics_graphs>
</Location>
Thanks again,
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 7:38
Attachments:
I don't think you need to disable mod_rewrite for those Location's since
they're not under /prefix so won't be affected anyway.
As for using mod_proxy, I don't know enough about it to say if it can work or
not.
I believe we could configure mod_pagespeed to fetch (and rewrite) files under
/prefix using various directives, but I also think that needs a later version
of mod_pagespeed than what you're using. If you can wait, we're in the process
of building a new stable release; if you can't wait, you can try upgrading to
the latest beta version.
Original comment by matterb...@google.com
on 11 Nov 2013 at 8:01
FWIW, the URL
http://musicasacra.lemon42.com/DE/repos/evoscripts/musica_sacra/returnBinaryImag
e/1/konzert/Bild_teaser.jpg
is served with Pragma:no-cache when fetched like this:
wget --header ModPagespeed:off --save-headers
http://musicasacra.lemon42.com/DE/repos/evoscripts/musica_sacra/returnBinaryImag
e/1/konzert/Bild_teaser.jpg
That pragma:no-cache prevents mod_pagespeed from optimizing the resource in the
HTML flow, changing the URL. That explains the symptom seen by the user.
However, it appears that in this configuration, in-place resource optimization
is enabled, and it appears to omit the pragma:no-cache header. This is a
little concerning. To help us reproduce, it would be useful to understand
where and why the pragma:no-cache is getting added in the configuration.
Original comment by jmara...@google.com
on 11 Nov 2013 at 8:07
Well as of now I'm using Release 1.7.30.1-beta. Will the next stable release be
a later one?
Is there any way to find out what triggers the "remembering not-cacheable
status" message?
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 8:12
Oh that's really strange!! Using
http://musicasacra.lemon42.com/DE/repos/evoscripts/musica_sacra/returnBinaryImag
e/1/konzert/Bild_teaser.jpg?ModPagespeed=off I don't see the Pragma:no-cache
header in the developer tools, but I get it from wget. I will look into that
let you know!
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 8:21
Ok so the pragma no_cache header is triggered because of the session handling.
When you request the image in the browser you will see that a set-cookie header
is present - but this is on the first request only.
Does google pagespeed fetch the images using cookies?
Anyway I will try to get rid of the session on the image and see what happens :)
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 8:43
In the HTML-rewriting flow, the images are fetched without any cookies (or, if
LoadFromFile is specified, then are read directly from the disk, without
cookies).
In the in-place flow, the images are not fetched, but are collected as an
Apache output filter, and so any cookies sent in the request will affect the
response.
What's your intended policy about delivering images? Do you want to see a
valid cookie in a request before responding with any images?
Original comment by jmara...@google.com
on 11 Nov 2013 at 8:48
Can I get more information about those flows somewhere to help me understand it?
And regarding the images yes that was the idea - I will try to find a better
solution, but the most important thing is that I know now what triggered the
behaviour.
And I will use wget for testing :)
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 8:53
OK, long explanation follows....the short version is simply "I don't believe
MPS can currently optimize resources for authorized clients but refuse to serve
them to unauthorized clients".
In the flow where we find an <img> tag in HTML, and want to rewrite the image
URL to point to the optimized version, we do a loopback fetch with no cookies
to get the image content.
In the in-place flow, we let Apache handle the image request normally, but
insert an extra output filter to collect the image bytes, optimize them, and
store the optimized bits in a cache. On subsequent requests, mod_pagespeed
handles the request directly from its cache, bypassing the default handler for
the cached resource.
Given that you only want images served to clients with a valid cookie, I think
mod_pagespeed doesn't currently have a correct solution that optimizes your
resource but avoids sending it to unauthorized clients. In the cookie-less
loopback fetch we'll either consider the response to be fully proxy-cacheable,
or we won't optimize it at all.
The mechanism you currently have of responding with pragma:no-cache is correct,
and makes mod_pagespeed avoid violating your privacy concern in its HTML flow.
So as far as I can tell, mod_pagespeed is working correctly with respect to
your policy. In the future we might consider implementing optimization of
private resources in the HTML flow but we definitely don't have that now.
The in-place optimization mechanism in mod_pagespeed right now appears to
bypass your privacy control. I *think* that rather than returning
pragma:no-cache, you'll get the same effect by responding always with
cache-control:private. mod_pagespeed will respect that.
But by responding sometimes with pragma:no-cache and sometimes not, I think
mod_pagespeed may wind up caching the response without the pragma and serving
it to all clients. In theory you could use Vary:Cookie in your response to
inform proxy caches to include the cookie in the cache key. However,
mod_pagespeed ignores vary headers on resources by default, and if you turn on
the switch that tells us to respect Vary, mod_pagespeed will simply give up on
trying to cache the resource.
Original comment by jmara...@google.com
on 11 Nov 2013 at 9:09
Got it! Really appreciate your help!
Original comment by bernhard...@lemon42.com
on 11 Nov 2013 at 9:12
Bernard, one action-item for you: I would suggest you add cache-control:private
to resources you don't want proxy-caches (e.g. CDNs and ISPs) to serve to
unauthorized users.
I am going to rename and refocus this issue on the fact that we strip your
pragma:no-cache when serving via the in-place flow.
Original comment by jmara...@google.com
on 12 Nov 2013 at 3:46
Summary was: mod_rewrite and mod_pagespeed image urls not rewritten
Original comment by jmara...@google.com
on 12 Nov 2013 at 3:46
Note: it's possible that the pragma:no-cache stripping happens as a result of
caching the response to an authenticated request, and using it to respond to an
unauthenticated request. In that case IMO it's the responsibility of the site
to add cache-control:private to ensure this doesn't happen.
But I want to verify that we will not strip the pragma when its delivered
unconditionally.
Original comment by jmara...@google.com
on 12 Nov 2013 at 3:49
thx, already added the cache-control header, I'm aware that using pragma:
no-cache is bad practice anyway. is there anything you still want me to test?
Original comment by bernhard...@lemon42.com
on 12 Nov 2013 at 3:59
No I'm all set. I tested that this sequence works as I think it should.
1. start a local apache on port 8080 with our examples installed.
2. wget --save-headers
http://localhost:8080/mod_pagespeed_example/images/Puzzle.jpg
repeat three times. The first two requests deliver the origin image (241k). The
third request and thereafter will deliver an optimized image (98k).
3. add "header add pragma no-cache" and restart apache
4. pagespeed will not see this header and will deliver the optimized image from
its cache, mimicing the broken behavior that you saw.
5. flush cache (for me, touch /usr/local/apache2/pagespeed_cache/cache.flush)
6. Now no matter how many times I wget that image, it will never be optimized,
and
will always pass through "pragma:no-cache".
I will now check out behavior with cc:private.
Original comment by jmara...@google.com
on 13 Nov 2013 at 2:27
Summary was: pragma:no-cache is stripped in in-place flow.
OK, cc:private prevents optimization of this resource, even when we add
ModPagespeedRewriteUncacheableResources on
I am now hijacking this bug to fix this bug. Note that the option is settable
in pagespeed.conf, but is not documented. There is implementation in the
source-code to support other integrations (PageSpeed Service) but they are not
live in mod_pagespeed.
See also: Issue 661
Original comment by jmara...@google.com
on 13 Nov 2013 at 2:34
Thinking about this further, I think this feature requires a change in
mod_pagespeed's current IPRO implementation.
Currently IPRO has two components, a resource-generator and an output filter.
The output filter. The output-filter is used to collect bytes on a new
resource and initiate optimization. Once the optimized result is stored in
cache, the substitution occurs in the resource generator, which runs very early
and subverts the normal resource handling.
To enable optimization of uncacheable resources, we'd instead do the
substitution in the output filter. It would be a bit wasteful because the
origin resource generator would have to fully run even when the results were
cached, and all we'd do with the bytes is buffer them in our output filter and
make sure their hash matches the optimized result we pulled from our cache.
This wouldn't be hard to implement (IMO). But it would force full buffering of
the resource in our output filter, however, because we'd want to make sure that
the origin resource didn't change before we start streaming out pre-optimized
bytes.
This might make it perform poorly for large resources (e.g. images) that ought
to be streamed from the disk. Consider a large PNG that gets optimized to a
tiny WEBP. We'd still have to let Apache generate the PNG fully and collect it
in our output filter, to verify it corresponds to the same PNG we optimized to
get a small WEBP.
And I don't see how to get around the need to run the full apache filter stack
for the resource, considering this testcase on http://musicasacra.lemon42.com
where cookies are used to authenticate the use before sending back the image.
We could avoid the buffering delay, however, if we considered the specification
of ModPagespeedRewriteUncacheableResources as a signal from the site owner to
PageSpeed that these resources don't vary in content by user (or user-agent).
We'd still make Apache generate the bits but we could send out the optimized
content immediately from our output filter without waiting for the full
response from the origin resource generator.
Original comment by jmara...@google.com
on 13 Nov 2013 at 2:53
Original issue reported on code.google.com by
bernhard...@lemon42.com
on 9 Nov 2013 at 11:07