davidfstr / rdiscount

Discount (For Ruby) Implementation of John Gruber's Markdown
http://dafoster.net/projects/rdiscount/
Other
753 stars 70 forks source link

Unicode headers produce invalid anchors #131

Closed mitchelltd closed 8 years ago

mitchelltd commented 8 years ago

When rdiscount processes headers to produce anchors (for use in TOC generation) it transforms UTF-8 into ASCII-8BIT. In the process, it turns non-ASCII characters into question marks. But question marks are reserved characters in URLs.

Example :

irb(main):001:0> require 'rdiscount' => true irb(main):002:0> test = "# Précis" => "# Précis" irb(main):003:0> rd = RDiscount.new(test, :generate_toc) => #<RDiscount:0x007f92d2026630 @text="# Précis", @generate_toc=true> irb(main):004:0> puts rd.toc_content <ul> <li><a href="#Pr?.cis">Précis</a></li> </ul>

=> nil irb(main):005:0> test.encoding => #<Encoding:UTF-8> irb(main):006:0> (rd.toc_content).encoding => #<Encoding:ASCII-8BIT>

It is worth comparing this outcome with that of GitLab flavoured markdown, which preserves unicode characters in link IDs.

davidfstr commented 8 years ago

Known issue. Tracking in #129