Encoding problem with TOC and Index

JackieXie168 / chmsee

Automatically exported from code.google.com/p/chmsee

GNU General Public License v2.0

0 stars 0 forks source link

Encoding problem with TOC and Index #102

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago

Chmsee doesn't use correct encoding for TOC and index no matter if I choose 
"Auto" or set correct encoding by hand. File encoding is Windows-1251.

You can download CHM file for testing from here:
http://www.lenininc.com/soft/webdes_ru.chm

Original issue reported on code.google.com by dmitriy.trt on 15 Dec 2010 at 9:13

Attachments:

GoogleCodeExporter commented 8 years ago

The TOC and index contents are generated from .hhc and .hhk files.
In this webdes_ru.chm file, they are Contents.hhc and Index.hhk locating
in the extracted ../bookshelf/2e47ef.../ directory.

I examined the strings in these two file and found that they are
composed of character entities,
e.g. the first string:

"Ñîäåðæàíèå",

after decoding, it goes to

"Ñîäåðæàíèå"

I tried to convert the result with "WINDOWS-1251" encode again, 
but it still remained the same form.

Do you have any experience to deal with this kind of encoding?

Original comment by jungl...@gmail.com on 23 Dec 2010 at 8:27

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

I have the same trouble.
In additional, if CHM file contents built from non-ascii-named files they 
cannot be opened with message:
Can not find link target file at 
"/home/pseudo/.cache/chmsee/bookshelf/995653064118f2c2e6a06ff6e373c31e/ñîäå�
�æàíèå.htm"
Reall file name:
/home/pseudo/.cache/chmsee/bookshelf/995653064118f2c2e6a06ff6e373c31e/содер
жание.htm
It seems chmsee interprets all filenames and titles as iso8859-1 coding. It 
would be great to interpret them as current locale-coded.

chmsee 1.3.0-2ubuntu2

Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04
Codename: precise

locale uk_UA.UTF-8.

Original comment by gpse...@gmail.com on 21 Sep 2012 at 9:13

GoogleCodeExporter commented 8 years ago

Hi gpseudo, 

Thank you remind me about this issue, I just checked with latest 
chmsee(v1.99.14), the bug still there. I will try to fix it with converting 
string by locale later.

Original comment by jungl...@gmail.com on 22 Sep 2012 at 7:13

GoogleCodeExporter commented 8 years ago

I added a converting which based the locale from chm file.
Now the TOC looks better, but INDEX is still has some mess there.

The modification already committed, you can get it from:

git://github.com/jungleji/chmsee.git

Original comment by jungl...@gmail.com on 25 Sep 2012 at 12:39

Attachments:

GoogleCodeExporter commented 8 years ago

Clear all pre-2.0 issues

Original comment by jungl...@gmail.com on 18 Jan 2013 at 6:10

Changed state: Done