Closed sid-kap closed 8 years ago
I don't think this is a bug in BlogLiterately
. Here's my very educated guess as to what is going on. BlogLiterately
's output is encoded using UTF-8. There is no need for it to be HTML escaped. The problem is that since it is just an HTML fragment, a browser has no way to know what character encoding it is using. For some reason, certain browsers default to a "Western" (ISO 8859) encoding. Interpreting UTF-8 as if it were ISO 8859 of course leads to strange characters being displayed
As a temporary fix for viewing the output files, you can explicitly tell your browser to use UTF-8. Usually this is e.g. in the "View" menu, under "Text Encoding". Also, since I assume your content will eventually end up embedded on some website, the enclosing site will probably inform the browser of the correct encoding so it can render your content correctly.
It might also be a decent idea to have BlogLiterately
automatically turn smart quotes, em dashes, and that sort of thing into numeric character references (or at least to provide an option to do so) in order to avoid all of this. Patches welcome! =)
You were right, this was actually a problem with my browser. The encoding was set to something strange, rather than UTF-8.
Closing this for now.
In
test.lhs
, I have:Running
stack exec BlogLiterately test.lhs
yields the text outputwhich, in a browser, renders as
It looks like it's not HTML-escaping the smart quotes properly? I imagine Pandoc should take care of this, but maybe it's not providing the right options to Pandoc.