Fixed #9 Mixed charsets now supported

Yspadadden commented 3 years ago

Now interpreting enough of the font table to get the charset for each font. If we encounter a byte string we interpret these bytes in the charset of the currently active font, or the default charset of the document, if no font is active.

bbottema commented 3 years ago

Amazing work! Thank you so much.

Yspadadden commented 3 years ago

@bbottema Thanks. I hope it will be useful for other people as well.

There is one caveat I should mention. The mapping from the \fcharsetXXX to an actual charset is mostly from wikipedia and another random website. I found in my testing, that the wikipedia entry may be wrong in some cases (that's why I needed an update for Russian and Hebrew). I don't have any other languages to test it, so some of the other mappings may be wrong too. It will be easy to fix, once you have an example that is not working. I am sorry that the testing is therefore a bit incomplete, but that is the best I can do with the available test cases I have right now.

bbottema / rtf-to-html

Fixed #9 Mixed charsets now supported #10