apache / incubator-pagespeed-mod

Apache module for rewriting web pages to reduce latency and bandwidth.
http://modpagespeed.com
Apache License 2.0
697 stars 159 forks source link

Parsed with an assumption of utf-8 #628

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
mod_pagespeed is parsing in utf-8 and outputting in utf-8, but actually 
documents are not explicit to utf-8.

CSS is treated as ASCII unless specified.

Here is an example that causes a problem:
content: " \00A0\00BB " !important;

This is charset-neutral, because it's been implemented via escape codes. 
mod_pagespeed will parse it as utf-8 and not track it did that, then spit it 
out as pure utf-8. But the browser will default it to ASCII (maybe choosing 
ISO-8859-1 for extended characters). Hence you'll see corrupt characters.

mod_pagespeed CSS URLs do not seem to respect Apache AddType directives, so at 
the server level you can't give a "this is utf-8" hint.

The only workaround I could find is to add this to my CSS file:
@charset "UTF-8";

You need to output non-ascii characters using escape codes if no "@charset" is 
given in the CSS -- OR -- you need to give a utf-8 mime type header 
automatically for mod_pagespeed CSS URLs.

If there is a defined @charset you need to respect the defined @charset 
matching it of course -- I didn't check that, but I wouldn't be surprised if 
you had a bug there too.

What version of the product are you using (please check X-Mod-Pagespeed
header)?
Dec 5th 2012 build

On what operating system?
Ubuntu

Which version of Apache?
2.2

Original issue reported on code.google.com by graham.c...@gmail.com on 26 Feb 2013 at 6:57

GoogleCodeExporter commented 9 years ago
Re AddType: Apache adds that after mod_pagespeed runs so it doesn't see the 
directive.
This is a source of frustration but we've not found a way around it sadly.

Original comment by matterb...@google.com on 26 Feb 2013 at 7:12

GoogleCodeExporter commented 9 years ago
This specific problem is actually fixed in the most recent release.

However, we do have a general issue that we assume input is UTF-8 and that 
output can be UTF-8.

Original comment by sligocki@google.com on 27 Feb 2013 at 3:57