How to deal with unicode characters?

ShabbyX / libpandoc

C bindings to Pandoc, a markup converter library written in Haskell.

90 stars 16 forks source link

How to deal with unicode characters? #7

Open itechbear opened 8 years ago

itechbear commented 8 years ago

It seems pandoc() doesn't support unicode characters. It just reports that codepoints of unicode chars are out of ascii char range [0,255)

ShabbyX commented 7 years ago

I should look into it. I see in 8ec4ac97 that the original author of libpandoc had changed the interface from wchar_t * to char *, adding a comment that all strings should be encoded as UTF-8. There's no explanation as to why!

Phyks commented 7 years ago

:+1: for this issue. It seems that whatever accentuated character I am feeding in libpandoc, I get the character with code 65533 as output.

Typically, input "é" gives "�".

ShabbyX commented 7 years ago

This is doable, if I simply duplicate everything and change all CStrings to CWStrings. But that would get quite ugly. I'll see if I can refactor some stuff to do this less dirty (having a newborn doesn't help finding time either!!)

Phyks commented 7 years ago

Ok I'll try to have a look at this hack around CWStrings. No problem, thanks for this lib!