byorgey / BlogLiterately

Command-line tool for formatting and publishing blog posts.
GNU General Public License v3.0
19 stars 4 forks source link

Smart quotes don't show up properly #26

Closed sid-kap closed 8 years ago

sid-kap commented 8 years ago

In test.lhs, I have:

'This' is "awesome".

Running stack exec BlogLiterately test.lhs yields the text output

<p>‘This’ is “awesome”.</p>
<div id="references" class="references">

</div>

which, in a browser, renders as

‘This’ is “awesome”.

It looks like it's not HTML-escaping the smart quotes properly? I imagine Pandoc should take care of this, but maybe it's not providing the right options to Pandoc.

byorgey commented 8 years ago

I don't think this is a bug in BlogLiterately. Here's my very educated guess as to what is going on. BlogLiterately's output is encoded using UTF-8. There is no need for it to be HTML escaped. The problem is that since it is just an HTML fragment, a browser has no way to know what character encoding it is using. For some reason, certain browsers default to a "Western" (ISO 8859) encoding. Interpreting UTF-8 as if it were ISO 8859 of course leads to strange characters being displayed

As a temporary fix for viewing the output files, you can explicitly tell your browser to use UTF-8. Usually this is e.g. in the "View" menu, under "Text Encoding". Also, since I assume your content will eventually end up embedded on some website, the enclosing site will probably inform the browser of the correct encoding so it can render your content correctly.

It might also be a decent idea to have BlogLiterately automatically turn smart quotes, em dashes, and that sort of thing into numeric character references (or at least to provide an option to do so) in order to avoid all of this. Patches welcome! =)

sid-kap commented 8 years ago

You were right, this was actually a problem with my browser. The encoding was set to something strange, rather than UTF-8.

Closing this for now.