JonasCz / save-for-offline

Android app for saving webpages for offline reading.
GNU General Public License v2.0
139 stars 45 forks source link

If http response contains "charset=utf-8", use utf-8 to display the offline page #5

Closed axkr closed 9 years ago

axkr commented 9 years ago

If the Content-Type in the http response contains "charset=utf-8", use utf-8 to display the page. See: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Response_fields

JonasCz commented 9 years ago

The HTML file usually specifies the charset, and webview should use this, right? AFAIK, there is no way to set the charset the webview uses. (Correct me if I'm wrong.)

axkr commented 9 years ago

According to this entry: http://stackoverflow.com/questions/3961589/android-webview-and-loaddata

something like this should work

myWebView.loadData(myHtmlString, "text/html; charset=UTF-8", null);

at least for Android > 4.0

JonasCz commented 9 years ago

The problem is that I save the page as HTML, and load it into the webview using webview.loadurl("file://" + locationOfHtmlFile) The webview should auto detect the encoding, right? Also, does it not work the way it is currently done?

axkr commented 9 years ago

utf-8 files are definitly shown with garbled characters!

Maybe this suggestions helps? http://stackoverflow.com/questions/4933069/android-webview-with-garbled-utf-8-characters

Copied example:

// Pretend this is an html document with those three characters
String scandinavianCharacters = "øæå";

// Won't render correctly
webView.loadData(scandinavianCharacters, "text/html", "UTF-8");

// Will render correctly
webView.loadDataWithBaseURL(null, scandinavianCharacters, "text/html", "UTF-8", null);
JonasCz commented 9 years ago

The problem is that I am saving the page as a HTML file (On SD card) , and the WebView should autodetermine the correct character encoding (from the <meta charset="UTF-8"> in the HTML file, just like a browser would.)

Is what I am doing now actually a problem with any specific websites?

It seems a little odd that I should read the HTML file as a String before loading it into the WebView.

Are you shure that the bad characters are not introduced into the file when downloading / parsing / saving the HTML file?

axkr commented 9 years ago

Take for example the german Spiegel Online - m.spiegel.de where all german umlaut characters are garbled. Samsung Galaxy / Android 4.4.x

Note: on a PC desktop it switches to another layout but this should have the same effect as the mobile version.

JonasCz commented 9 years ago

Yes, I get this too. Loading the HTML file as a string first seems a bit of a hack to me , but will try it. Maybe there is some other solution?

JonasCz commented 9 years ago

Should be fixed now.