apache / incubator-pagespeed-mod

Apache module for rewriting web pages to reduce latency and bandwidth.
http://modpagespeed.com
Apache License 2.0
696 stars 158 forks source link

mod_pagespeed respects meta-tag claiming XHTML but browsers evidently do not. #365

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Upgrade your current modpagespeed-beta to 0.10.19.5-1253
2. See any of your html pages (maybe only the ones without .html extension)
3. It will render the pages as xhtml+xml instead of html, making all warnings 
to break completely your page

What is the expected output? What do you see instead?
Instead of seeing the page as a html, browser tries to render it as xml making 
it unreadable because of error message appears. (see the attached image) 

This is the expected output ([text/html]): 
roger@roger-desktop:~$ wget http://remakeup.es/102-maquillaje-de-labios
Longitud: no especificat [text/html]

This is the obtained output [application/xhtml+xml]: 
roger@roger-desktop:~$ wget http://remakeup.es/102-maquillaje-de-labios
Longitud: 95999 (94K) [application/xhtml+xml]

What version of the product are you using (please check X-Mod-Pagespeed
header)?
0.10.19.5-1253 according to yum.

On what operating system?
Centos 5.7

Which version of Apache?
Apache/2.2.3

Which MPM?

Please provide any additional information below, especially a URL or an
HTML file that exhibits the problem.

I uninstalled modpagespeed-beta because of this problem. 
The attached image shows to incorrect rendering, while if you go to the website 
you will see the correct one. 

Original issue reported on code.google.com by Roger.Si...@gmail.com on 2 Jan 2012 at 8:04

Attachments:

GoogleCodeExporter commented 9 years ago
I believe I've found the problem.  I've certainly reproduced it using mod_proxy 
& mod_pagespeed.

The problem does not show up when you use any of the core-filters I've tried, 
but I did see a rendering bug with the "make_google_analytics_async" enabled, 
even if no other filter was enabled.  When you had mod_pagespeed installed, did 
you have that filter enabled?

I did not see the problem you noticed with xhtml+xml.  Can you give more detail 
why you thought that was related to the symptom you saw?

Thanks!
-Josh

Original comment by jmara...@google.com on 26 Jan 2012 at 11:26

GoogleCodeExporter commented 9 years ago
Hi, 

I don't think I changed the configuration of modpagespeed, I simply installed 
it without changing anything. I even tried again and the problem comes back. 

It could be that the explanation I give to the error I'm seeing it is not 
accurate. The problem is that when I install modpagespeed, my Prestashop store 
is broken because apparently the browser gets really strict on interpreting 
xml+html. Usually Chrome or any other browser can render a page even if <br> is 
not written as <br /> or even if it some tag is not perfectly wellformed. 
Having modpagespeed makes it really strict, and the browser will simply refuse 
to render the page when a minor error is found. 

I know I could make my html pages perfectly xml/html valid, but that will slow 
other much more important aspects. As you know most of the web is not perfectly 
well-formed, and most of it, still looks ok. 

I see that mod_proxy is loaded in my httpd.conf, but that was the default in my 
installation (CentOS+Webmin/Virtualmin): 
# In httpd.conf
LoadModule proxy_module modules/mod_proxy.so

Should I comment this line?

I can tell you more details, but I am not an expert debugging mod_pagespeed, 
simply a user. 

Original comment by Roger.Si...@gmail.com on 27 Jan 2012 at 8:09

GoogleCodeExporter commented 9 years ago
I'd like to look into this problem more deeply but I don't think I can 
reproduce it by mirroring your site.  Can you install mod_pagespeed on your 
server but turn it off by putting:

   ModPagespeed off

in your pagespeed.conf and then restarting Apache?  It will have no effect on 
your traffic in this case, but I will be able to try to figure out what the 
problem is by enabling it in a query-param:

   http://remakeup.es/102-maquillaje-de-labios?ModPagespeed=on

Then I can hopefully track down the problem.

Original comment by jmara...@google.com on 27 Jan 2012 at 12:24

GoogleCodeExporter commented 9 years ago
Great, thanks for the reply. 

Now you can see the problem using the debugging parameter. 

If you need anything, please let me know. 

Original comment by Roger.Si...@gmail.com on 27 Jan 2012 at 12:44

GoogleCodeExporter commented 9 years ago
I've discovered the problem, or at least one of them.  You have this markup in 
your HTML <head>:

  <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>

As of 0.10.19.*, mod_pagespeed converts that into an HTTP header, overwriting 
the original content-type, changing the doc to be interpreted as XHTML.

Can you do one of two things?
   1. Change the <meta> tag in your HTML to reflect what you actually want in your HTTP headers
   2. Disable the convert_meta_tags filter via
          ModPagespeedDisableFilters convert_meta_tags

That will resolve this problem.  However, your site is still not working quite 
optimally with mod_pagespeed.  If I request this URL:
    http://remakeup.es/102-maquillaje-de-labios?ModPagespeed=on&ModPagespeedFilters=collapse_whitespace,extend_cache
Then whitespace-collapsing works fine (excess whitespace is stripped from your 
HTML).  However, the "extend_cache" filter is not working.  I suspect there's 
something in your server setup that's preventing mod_pagespeed from fetching 
(for example) http://remakeup.es/themes/theme058/css/global.css.  Do you see 
anything in your Apache error log that indicates "serf failures"?  Depending on 
your network setup, you may want to add something like this in your 
pagespeed.conf:
     ModPagespeedMapOriginDomain localhost remakeup.es

Please let me know if this helps.

Original comment by jmara...@google.com on 27 Jan 2012 at 2:02

GoogleCodeExporter commented 9 years ago
Summary was: mod-pagespeed parses as xml instead of html and breaks everything

Marking this invalid -- the <meta> tag indicates the doc was xhtml when it's 
not.

You could argue that since browsers appear to ignore the meta-tag's claim of 
XHTML, we should too.  But I haven't heard of this on other sites so for now I 
think I'll mark this invalid.  We can re-visit if we find this is a common 
problem.

Original comment by jmara...@google.com on 7 Feb 2012 at 2:38

GoogleCodeExporter commented 9 years ago
Hi, I agree, if nobody else complains it might be a very specific situation. 

However I wanted to tell you that mod_pagespeed have been caused some serious 
performance issues on my servers and it is not ready for high performance 
environments. 

Several of my servers (the ones that had it installed) were at 10-15 load 
average, and after turning mod_pagespeed off, it went down to 1-3 load average. 
I have pages with several thousands of views per day, but the performance price 
is by default too high. I guess this means it is valid only for testing, not 
for production sites. 

I just wanted to let you know. 

Original comment by Roger.Si...@gmail.com on 9 Feb 2012 at 8:15

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago

Original comment by sligocki@google.com on 1 Nov 2012 at 6:09