Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.84k stars 277 forks source link

unexpanded < > & #109

Closed stefanor closed 7 years ago

stefanor commented 8 years ago

From: https://bugs.debian.org/791470

Version: 2015.6.21-1 (and current master):

$ echo '<body>&lt;&gt;&amp;</body>' | html2markdown
&lt;&gt;&amp;

It worked correctly in 2014.9.25-1:

$ echo '<body>&lt;&gt;&amp;</body>' | html2markdown
<>&
stefanor commented 8 years ago

Bisect blames 446a8eb0733835fa2b2e61b8e311a02f9325cf00 (that is #57)

jwilk commented 8 years ago

Relevant part of Markdown documentation: https://daringfireball.net/projects/markdown/syntax#autoescape While leaving the entities unexpanded is technically OK, it makes the output unnecessarily illegible for human readers.

theSage21 commented 8 years ago

@jwilk any reasons I should avoid closing this issue?

tahajahangir commented 8 years ago

@jwilk The relevant part of of markdown-spec is about converting markdown to html, not about converting html to markdown.

If one is writing markdown only for converting it to html (and not presenting directly to humans), it's ok for him to use entities. But for a html2text library it's unacceptable.

We use this library to convert html part of emails to text part. With newer versions of html2text, the input: From: "My Name"<span>&lt;name@mail.com&gt;</span> generates output From: "My Name"&lt;name@mail.com&gt; but it should generate From: "My Name"<name@mail.com>

@theSage21 Please reopen the issue (to discuss and revert #57)

theSage21 commented 8 years ago

I am going to be on a flaky Internet connection for about a month. I'll try to fix this as soon as possible.

theSage21 commented 8 years ago

I find this convincing. https://github.com/aaronsw/html2text/pull/59
How about a -human flag like the du command? That would make sense.

tahajahangir commented 8 years ago

I suggest it to be --html-escape flag, although escaping can be done by user himself after converting html to text.

bjones1 commented 7 years ago

The patch @gabalese made fixes this issue for me -- would you consider applying it?

theSage21 commented 7 years ago

Seems good. @Alir3z4 you agree?

Alir3z4 commented 7 years ago

@theSage21 It makes sense to me too.

bjones1 commented 7 years ago

Pinging -- would you like me to submit a pull request containing @gabalese's fix? Or would you prefer to apply it yourself?

Alir3z4 commented 7 years ago

@bjones1 Please feel free to send the patch, I'll make sure to get reviewed quick and released as soon as possible.

ciprianmiclaus commented 7 years ago

I will pick this up and send a patch.

Alir3z4 commented 7 years ago

Awesome

On Fri, Jun 16, 2017 at 12:35 AM, Ciprian Miclaus notifications@github.com wrote:

I will pick this up and send a patch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Alir3z4/html2text/issues/109#issuecomment-308858383, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkFCfWIzOcYW3sn9lTgrMR0IsgD-Dhmks5sEZV-gaJpZM4HEhPr .

bjones1 commented 7 years ago

@ciprianmiclaus, thanks. I got bogged down in other areas. This fix will help me on my projects.