fletcher / peg-multimarkdown

An implementation of MultiMarkdown in C, using a PEG grammar - a fork of jgm's peg-markdown. No longer under active development - see MMD 5.
Other
525 stars 55 forks source link

Email obfuscation mangles UTF-8 #139

Closed sorbits closed 11 years ago

sorbits commented 11 years ago

Summary

If the text of an email link contains non-ASCII characters each byte (of an UTF-8 sequence) is treated as its own (signed) integer and output as a numeric entity.

Steps to Reproduce

Run the following in a shell:

multimarkdown <<< '[åøæ](mailto:a@b.c)'

Expected Result

<p><a href="&#x6d;&#x61;&#x69;&#108;&#116;&#111;&#58;&#97;&#x40;&#x62;&#46;&#x63;">&#xE5;&#xF8;&#xE6;</a></p>

Actual Result

<p><a href="&#x6d;&#x61;&#x69;&#108;&#116;&#111;&#58;&#97;&#x40;&#x62;&#46;&#x63;">&#-61;&#-91;&#xffffffc3;&#xffffffb8;&#xffffffc3;&#xffffffa6;</a></p>

Notes

A satisfying solution would also be for multimarkdown to leave non-ASCII characters alone (as a user may not use UTF-8).

Version

Using peg-multimarkdown version 3.6 (installed via MacPorts).

fletcher commented 11 years ago

I've submitted a fix to John MacFarlane, since this is actually inherited from peg-markdown.

Thanks for the clearly defined problem, and bonus points for the proposed solution. ;)

F-

PS> Thanks for TextMate as well!

sorbits commented 11 years ago

Since commit a0df7a713a1260a0cc9111affaec014608be9107 this is no longer an issue, thanks for the fix!

I noticed commit 4e4ff91bfdfcfc51039141cbae6fa22d5b35c1bc bumps the version string to 3.7. I assume you’ll add a tag later (this is the ideal way for MacPorts to deal with it) — though before tag’ing a 3.7 release, I think it would be nice to grab pull request #130.

fletcher commented 11 years ago

Done