crowell / modpagespeed_tmp

Automatically exported from code.google.com/p/modpagespeed
Apache License 2.0
0 stars 0 forks source link

Rewrite resources with utff-8 characters in the URLs. #704

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
http://tx094.technologixstudio.net/ contains an image reference:

<img class="fldi reflist" src="/media/images/fotó_kicsi.jpg" height="50" 
width="50"/>

mod_pagespeed cannot currently decode the non-latin-1 character in the URL and 
gives up on it.  mod_pagespeed properly retains the utf-8 text; it can parse 
and re-serialize the document, but to fetch the resource & optimize it, it must 
be able to decode the URL, which in this case is in utf-8.

This is a reasonable thing to do; we just don't do it yet.

Original issue reported on code.google.com by jmara...@google.com on 22 May 2013 at 2:10

GoogleCodeExporter commented 9 years ago
I don't remember of the top of my head what happens when charset is something 
other than utf-8. A quick test with koi-8 shows that Chrome and Firefox seem to 
transcode it into %-encoded utf-8 (which we currently can't do), but it's 
probably OK to special-case utf-8 too --- except for the chronic issue of not 
knowing the exact charset.

Original comment by morlov...@google.com on 22 May 2013 at 2:25