apache / incubator-pagespeed-mod

Apache module for rewriting web pages to reduce latency and bandwidth.
http://modpagespeed.com
Apache License 2.0
697 stars 159 forks source link

Double-check and better test interaction of base url and redirects #319

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Are we using the correct base url for pages whose original url redirects to 
another url?  This was flagged as not thoroughly tested, and ought to be tested 
for.

Original issue reported on code.google.com by jmaes...@google.com on 11 Jul 2011 at 1:47

GoogleCodeExporter commented 9 years ago
I think this is probably related.

I still see plenty of error log entries along the lines of:
[Thu Dec 15 11:21:18 2011] [error] [mod_pagespeed 0.10.19.5-1253 @16359] 
x1367_103_0001_100px.JPG:0: Resource based on 
http://www.sellmyretro.com/user/profile/uploaded/img/1367_103_0001_100px.JPG 
but cannot access the original

The actual image address is:
http://www.sellmyretro.com/uploaded/img/1367_103_0001_100px.JPG

Referenced in the page as:
<base href="http://www.sellmyretro.com/"/>
..
...
<img src="/uploaded/img/1367_103_0001_100px.JPG" />

The image does get displayed correctly, so I am not sure what is causing the 
log entries.  When viewing the page (http://www.sellmyretro.com/user/profile ), 
I can see:

<img 
src="http://www.sellmyretro.com/uploaded/img/x1367_103_0001_100px.JPG.pagespeed.
ic.kB5PMQbd4v.jpg" />

Original comment by sexy.ric...@googlemail.com on 15 Dec 2011 at 12:17

GoogleCodeExporter commented 9 years ago
This is certainly odd: if the image is indeed ending up in the right place,
I'm not sure why it's looking in the wrong place initially.  Do you happen
to have a non-login-required page that exhibits the problem?

Original comment by jmaes...@google.com on 16 Dec 2011 at 12:37

GoogleCodeExporter commented 9 years ago
Yes - another one is:

[Fri Dec 16 08:46:46 2011] [error] [mod_pagespeed 0.10.19.5-1253 @25011] 
x1367_103_0001_100px.JPG:0: Resource based on 
http://www.sellmyretro.com/category/All+categories/Retro+Computers/Sinclair/uplo
aded/img/1367_103_0001_100px.JPG but cannot access the original
[Fri Dec 16 08:46:46 2011] [error] [mod_pagespeed 0.10.19.5-1253 @25011] Fetch 
failed for 
http://www.sellmyretro.com/category/All+categories/Retro+Computers/Sinclair/uplo
aded/img/x1367_103_0001_100px.JPG.pagespeed.ic.K8xqdhI75q.jpg, status=0

This is when browsing 
http://www.sellmyretro.com/category/All+categories/Retro+Computers/Sinclair

THe image appears on page 2.

Original comment by sexy.ric...@googlemail.com on 16 Dec 2011 at 9:13

GoogleCodeExporter commented 9 years ago
I also ended up in seeing this error. But, the strange thing is that same URL 
works after few minutes. In other words, during the first attempt it fails and 
it fails again if we tried within another minute, but works if we try after 5 
minutes.

Original comment by jimyjo...@gmail.com on 13 Jan 2012 at 6:25

GoogleCodeExporter commented 9 years ago

Original comment by jmara...@google.com on 24 May 2012 at 7:33

GoogleCodeExporter commented 9 years ago

Original comment by jmara...@google.com on 24 May 2012 at 7:33

GoogleCodeExporter commented 9 years ago
I'm not sure what this bug was originally concerned about. If there is an HTTP 
redirect (301/302), we will only be rewriting on the final URL. I think the 
follow-up comments are about internal mod_rewrite-style URL changes. We are 
already dealing with those and have tests for them (although if you are still 
having problems like this, you should let us know).

Original comment by sligocki@google.com on 11 Dec 2012 at 6:49

GoogleCodeExporter commented 9 years ago
I still get these errors occasionally.

The issue is that the webpage uses a base href of 'http://www.sellmyretro.com'
The images are then linked by simple <img 
src="uploaded/img/1367_103_0012_100px.JPG">

For example I see the following entry:

x1367_103_0012_100px.JPG:0: Resource based on 
http://www.sellmyretro.com/offer/details/uploaded/img/1367_103_0012_100px.JPG 
but cannot access the original

I wonder if it may not actually be a modpagespeed error, but more an issue with 
some search spiders ignoring the base href = I think that this is actually more 
likely, as viewing the page:
http://www.sellmyretro.com/offer/details/1367

Does work correctly

Original comment by rwap.services on 12 Dec 2012 at 6:51

GoogleCodeExporter commented 9 years ago
We have seen issues with naive spiders crawling the wrong pages. You could look 
into your Apache access log to see what UAs are requesting those URLs. If you 
are seeing a problem, please open a new bug because this one to distinguish it 
from the original task in this bug.

Original comment by sligocki@google.com on 13 Dec 2012 at 4:12