Closed eljoest closed 8 years ago
@eljoest Could you post the html source you receive when you also turn on the debug filter?:
?ModPagespeed=on&ModPagespeedFilters=extend_cache,rewrite_images,debug
Sorry for the long delay, the relevant section shows:
<a href="/uploads/pics/OBD4HDD.jpg" class="fbox" rel="fb" target="_blank"><img src="/typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg" width="240" height="170" alt="" data-pagespeed-url-hash="1636956066" onload="pagespeed.CriticalImages.checkImageForCriticality(this);"/><!--Image does not appear to need resizing.--><!--The image was not inlined because it has too many bytes.--></a>
I just tried reproducing this naively on a test site, and I do see images getting rewritten to be cache extended:
<img src=optimized.jpg.pagespeed.ce.0aKT3Yvs2R.jpg>
Copying /typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg
locally, I also see it getting cache extended, so it's not a problem with the image:
<img src=csm.jpg.pagespeed.ce.d9AGCf-0p-.jpg><!--Image does not appear to need resizing.--><!--The image was not inlined because device does not support inlinling.-->
The problem isn't that PageSpeed can't fetch the image or something, since it's able to tell that it doesn't need resizing, and that it's too large to inline. Additionally, when fetched from Chrome it can make a webp:
xcsm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ic.y7s9JIrRdq.webp
If I load your page with extend_cache_images,debug
, I still don't see it getting cache-extended, so it doesn't look like a problem with different filters interacting badly:
<img src="/typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg" width="240" height="170" alt=""/>
Copying the full tag also doesn't fix it:
<div class="news-single-img"><a href="uploads/pics/OBD4HDD.jpg" class="fbox" rel="fb" target="_blank"><img src="typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ce.d9AGCf-0p-.jpg" width="240" height="170" alt=""/></a><p class="news-single-imgcaption" style="width:240px;"></p></div>
I'm at a bit of a loss. Is there anything mentioning this image in your apache log file?
How good are the original caching headers? If it's already cacheable for at least a month we won't touch it. Unusual mimetypes may matter, too.
On Wed, Aug 24, 2016 at 3:10 PM, Jeff Kaufman notifications@github.com wrote:
I just tried reproducing this naively on a test site, and I do see images getting rewritten to be cache extended:
Copying /typo3temp/processed/csm_OBD4HDD_4d2d95dc52.jpg locally, I also see it getting cache extended, so it's not a problem with the image:
The problem isn't that PageSpeed can't fetch the image or something, since it's able to tell that it doesn't need resizing, and that it's too large to inline. Additionally, when fetched from Chrome it can make a webp:
xcsm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ic.y7s9JIrRdq.webp
If I load your page with extend_cache_images,debug, I still don't see it getting cache-extended, so it doesn't look like a problem with different filters interacting badly:
Copying the full tag also doesn't fix it:
I'm at a bit of a loss. Is there anything mentioning this image in your apache log file?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1373#issuecomment-242175968, or mute the thread https://github.com/notifications/unsubscribe-auth/ADl1ROl2LdmAGepGLz3BaTPi32O2apypks5qjJc1gaJpZM4Jf5w6 .
The origin response headers, with ?PageSpeed=off
and requested from Firefox, are:
Accept-Ranges: bytes
Cache-Control: public, max-age=43200
Connection: keep-alive
Content-Length: 22962
Content-Type: image/jpeg
Date: Wed, 24 Aug 2016 19:32:28 GMT
Etag: "59b2-5357a6d22953f"
Expires: Fri, 23 Sep 2016 19:32:28 GMT
Last-Modified: Fri, 17 Jun 2016 14:46:56 GMT
Server: nginx
Vary: Origin
X-Backend-Server: w3ext1
Normal mime, max-age isn't that far out.
It is something about the headers, though! If I set up my local server with:
pagespeed Domain https://www.tracetronic.de;
And reference the image directly:
<img src="https://www.tracetronic.de/typo3temp/_processed_/csm_OBD4HDD_4d2d95dc5\
2.jpg">
Then I get xcsm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ic.y7s9JIrRdq.webp
in chrome, but if I turn off convert_webp_to_jpeg
I get csm_OBD4HDD_4d2d95dc52.jpg
.
Very strange!
reproduced locally with netcat; now trying to find out which headers are responsible
It's the Vary: Origin
header that's doing it.
If I have no Vary
header, then it works properly.
If I have Vary: Cookie
or Vary: Accept
then I get "Uncacheable content, preventing rewriting" (and it also won't make a webp)
If I have an unknown Vary
header like Vary: Cabbage
then I get the same strange behavior as Vary: Origin
.
Vary: User-Agent
also has the same strange behavior as Vary: Origin
and Vary: Cabbage
.
I'm guessing this is something weird about how RespectVary
is implemented: https://developers.google.com/speed/pagespeed/module/configuration#respectvary
@jeffkaufman Thanks a lot for the thorough analysis. I have removed Vary: Origin
for the images (mostly used for js/"static" ajax) and after wiping the cache it works as expected with my configuration.
I haven't set a value for ModPagespeedRespectVary, so it seems to be a weird behavior with the default setting.
Reproduced this locally:
$ PUZZLE_ORIG=localhost:8050/mod_pagespeed_example/images/Puzzle.jpg
$ curl $PUZZLE_ORIG > /dev/null ; sleep 1 ; curl $PUZZLE_ORIG > /dev/null ; sleep 1
$ curl $PUZZLE_ORIG > mod_pagespeed_test/Puzzle-optimized.jpg
$ echo '<img src=Puzzle-optimized.jpg>' > mod_pagespeed_test/Puzzle-optimized.html
Add to test/pagespeed_test.conf.template
:
location /mod_pagespeed_test/Puzzle-optimized.jpg {
add_header Vary Origin;
}
Now see that it's not optimized normally:
$ curl localhost:8050/mod_pagespeed_test/Puzzle-optimized.html
<head/><img src=Puzzle-optimized.jpg>
But we'll still turn it into a webp:
$ curl -H 'Accept: image/webp' localhost:8050/mod_pagespeed_test/Puzzle-optimized.html?PageSpeedFilters=+convert_jpeg_to_webp
<head/><img src=xPuzzle-optimized.jpg.pagespeed.ic.jjOWf4PvS4.webp>
And we don't actually have to start with an optimized image; it's enough to ask about cache-extension:
location /mod_pagespeed_example/images/Puzzle.jpg {
add_header Vary Origin;
}
$ curl localhost:8050/mod_pagespeed_test/Puzzle.html
<head/><img src=http://localhost:8050/mod_pagespeed_example/images/xPuzzle.jpg.pagespeed.ic.YsUBdV-i_6.jpg>
$ curl localhost:8050/mod_pagespeed_test/Puzzle.html?PageSpeedFilters=extend_cache
<img src=../mod_pagespeed_example/images/Puzzle.jpg>
So: it looks like it we have a resource with a Vary
header then we won't cache extend it, but if we can optimize it at all then we will do so, which includes cache-extending it.
Here's a unit test that captures this:
TEST_F(CacheExtenderTest, VaryOrigin) {
options()->EnableExtendCacheFilters();
rewrite_driver()->AddFilters();
GoogleString url = AbsolutifyUrl("a.css");
ResponseHeaders response_headers;
DefaultResponseHeaders(kContentTypeCss, 100 /* ttl */, &response_headers);
response_headers.Add("Vary", "Origin");
SetFetchResponse(url, response_headers, "css file");
GoogleString cache_extended_css = Encode("", kFilterId, "0", "a.css", "css");
ValidateExpected("vary origin",
StringPrintf(kCssFormat, "a.css"),
StringPrintf(kCssFormat, cache_extended_css.c_str()));
}
To decide whether it's ok to cache-extend, cache_extender.cc
checks headers->IsProxyCacheable()
, which I see returning false
if I set Vary: Origin
. Our optimization filters probably do something else?
CacheableResourceBase::IsValidAndCacheableImpl
calls headers->IsProxyCacheable(req_properties, respect_vary(), ResponseHeaders::kNoValidator)
while cache_extender.cc
calls headers->IsProxyCacheable()
. And "The zero-arg version of IsProxyCacheable
gives a pessimistic answer, assuming the request has cookies, there is no validator, and we respect Vary."
So the problem is that in cache extending we ignore the RespectVary
configuration setting, and just assume the value is true, which is especially confusing since the default value is false.
All the calls to the zero-arg version of IsProxyCachable
are suspect; I'll go through them.
Cases to fix:
CacheExtender::RewriteLoadedResource
: easy to pass in respect_vary from optionsRewriteDriver::Write()
: this operates on our output headers for rewritten resources. Those shouldn't have Vary
headers that make them uncachable, but I guess if they do we should respect them since we created them.@eljoest
I haven't set a value for ModPagespeedRespectVary, so it seems to be a weird behavior with the default setting.
Yes, it's a bug, and I'm working on a patch.
Thanks for reporting this!
quick question -- should we get rid of the zero-arg version of IsProxyCacheable()?
On Thu, Aug 25, 2016 at 4:04 PM, Jeff Kaufman notifications@github.com wrote:
Closed #1373 https://github.com/pagespeed/mod_pagespeed/issues/1373 via dfda935 https://github.com/pagespeed/mod_pagespeed/commit/dfda93522ded43e33d4eb641586515d647641b86 .
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1373#event-767994931, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2kPbv_jac4PBOy4ixR4Aw6I02VugZRks5qjfVUgaJpZM4Jf5w6 .
@jmarantz The zero-arg version makes sense in the places it's still being used: (1) tests and (2) handling resources pagespeed creates (where respect-vary shouldn't apply).
X-Mod-Pagespeed: 1.11.33.2-0 on apache
I'm seeing partial rewrites of the HTML src attributes.
The issue can be reproduced with the request parameters
?ModPagespeed=on&ModPagespeedFilters=extend_cache,rewrite_images
but also happens with the production filters:Original headers of the various images don't differ regarding cache info. This one fails but can't be compressed further:
This one works but is getting reduced to 6204 bytes:
In mod_pagespeed_message I can see: