google-code-export / fanficdownloader

Automatically exported from code.google.com/p/fanficdownloader
0 stars 0 forks source link

Encoding problems for certain characters in HTML output #26

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am using Calibre version 0.8.49 with FFDL plugin 1.5.16 on Windows 7 64-bit. 
When download stories to HTML files certain characters are mistranslated. This 
does not happen to all stories only a few, I have listed a few below for 
reference. I have listed the characters below that are affected and some of the 
sites that this occurs on. It seems to be an encoding problem as notepad++ is 
able to see the correct content but when I open the page in Firefox the garbled 
characters are back. I have attached screenshots of the output I get. Please 
let me know if there is anything I have to do to fix this issue.

–
½
é
…
“
”
¾
'

http://www.fanfiction.net/s/6473889/1/
http://www.fanfiction.net/s/4246300/1/
http://www.fanfiction.net/s/3668356/1/
http://www.fanfiction.net/s/4199033/1/
http://www.ficwad.com/story/105190
http://www.fanfiction.net/s/4852650/1/
http://www.fanfiction.net/s/2227035/1/
http://www.fanfiction.net/s/3123793/1/

Original issue reported on code.google.com by daniel.h...@gmail.com on 6 May 2012 at 5:48

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the very detailed bug report, it really helps to have all the 
details up front.

The HTML output is UTF8.  I've tried several of these and they all looked fine 
when the browser encoding is set to UTF8.

You can set your default encoding in Firefox in Tools-> Options-> Content-> 
Fonts & Colors-> Advanced-> Default Character Encoding.  I find UTF8 a better 
choice these days.

The only oddity I see is that calibre injects:
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" 
http-equiv="Content-Type"/>
...into the html.

I would have thought:
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> or
<meta content="application/xhtml+xml; charset=utf-8" http-equiv="Content-Type"/>
...would be more correct.

Adding a Content-Type line to the HTML before it's given to calibre doesn't 
seem to make any difference--calibre replaces it.

I've opened a bug against calibre regarding the content-type:  
https://bugs.launchpad.net/calibre/+bug/995553

I'll leave this issue open for the time being so I can update it with the 
response from the calibre bug report.

Original comment by retiefj...@gmail.com on 6 May 2012 at 5:19

GoogleCodeExporter commented 9 years ago
Thank you. I had tried a few ways to force the browser to use UTF-8. It seems I 
was doing it the wrong way.

Original comment by daniel.h...@gmail.com on 6 May 2012 at 5:31

GoogleCodeExporter commented 9 years ago
Kovid's quick!

https://bugs.launchpad.net/calibre/+bug/995553
> Fixed in branch lp:calibre. The fix will be in the next release. calibre
> is usually released every Friday.
> 
>  status fixreleased
> 
> ** Changed in: calibre
>        Status: New => Fix Released

Original comment by retiefj...@gmail.com on 6 May 2012 at 6:05